A very interesting slide deck comparing sync and async IO

Thu Mar 3 22:55:29 PST 2016

On 03/03/16 19:31, Andrei Alexandrescu wrote:
> https://www.mailinator.com/tymaPaulMultithreaded.pdf
>
> Andrei

You just stepped on a pet peeve of mine. NIO isn't async IO. It's 
non-blocking IO. Many people (including Microsoft's MSDN) confuse the 
two, but they are completely and utterly different, and in fact, quite 
orthogonal (though the blocking async combination makes no sense, which 
is possibly the source of the confusion).

Blocking IO means that if a request cannot be served immediately, your 
thread blocks until it can be served. Try to read from a socket with no 
data, the call to "read" will only return once data arrives.

Synchronous IO means that a call only returns once it can be said, on 
some level, to have been performed. If you call "send" on a socket in 
synchronous mode, then, depending on the socket type, it will return 
after it has sent out the information (without waiting for an ACK), or, 
at the very least, after having copied the information into the kernel's 
buffers. If you call an asynchronous version of send, the copying of 
information into the socket may take place after the system call has 
already taken place.

In Linux, for example, non-blocking mode is activated using a fcntl 
(O_NONBLOCK). Asynchronous operations are separate and distinct system 
calls (aio_*, io_* and vmsplice).

Using non blocking mode introduces some challenges, but is, generally 
speaking, not very complicated (particularly when using fibers). Using 
asynchronous IO is considerably more complicated, as you need to keep 
track which of your buffers is currently having async operations done 
on. Not many systems are built around async IO.

This may change, as most of the user space only low latency IO solutions 
(such as DPDK) are async by virtue of their hardware requirements.

On a completely different note, me and a colleague started a proof of 
concept to disprove the claim that blocking+threads is slower. We did 
manage to service several tens of thousands of simultaneous connections 
using the one thread per client mode, proving that the mere context 
switch is not the problem here. Sadly, we never got around to bringing 
the code and the tests up to the level where we can publish something, 
but, at least in the case where a system call is needed in order to 
decide when to switch fibers, I dispute the claim that non-blocking is 
inherently faster.

Shachar