A very interesting slide deck comparing sync and async IO
Sönke Ludwig via Digitalmars-d
digitalmars-d at puremagic.com
Sat Mar 5 00:55:14 PST 2016
Am 03.03.2016 um 18:31 schrieb Andrei Alexandrescu:
> https://www.mailinator.com/tymaPaulMultithreaded.pdf
>
> Andrei
A few points that come to mind:
- Comparing random different high-level libraries is bound to give
results that measure abstraction overhead/non-optimal system API use.
Comparing on a JVM instead of bare-metal might skew the results further
(e.g. some JIT optimizations not kicking in due to the use of callbacks,
or something like that). It would be interesting to redo the benchmark
in C/D using plain system APIs.
- Comparing single-thread NBIO to multi-threaded BIO is obviously wrong
when measuring peak-performance. NBIO should use a pool of on thread per
core, each running an event/select loop, or alternatively using one
process per core. The "Make better use of multi-cores" pro-BIO argument
is pointless for that same reason.
- Missing any hints about how the benchmark was performed (e.g.
send()/recv() chunk size). For anything other than tiny packets, NBIO
for sure is not measurably slower than BIO. Latency may be a bit worse,
but that reverses once many connections come into play.
- The "simpler to write" argument also breaks down when adding fibers to
the mix.
- Main argument for NBIO is that threads are relatively heavy system
resources, context switches are rather expensive, and are limited in
their number (irrespective of the amount of RAM). Depending on the
kernel, the scheduler overhead may also grow with the number of threads.
For small numbers of connections, IO for sure is perfectly fine, as long
as synchronization overhead isn't an issue.
- AIO/NBIO+fibers also allows to further reduce memory footprint by
detaching the connection from the fiber between requests (e.g. for a
keep-alive HTTP connection). This isn't possible with blocking IO.
- The optimal approach always depends on the system being modelled,
NBIO+fibers simply gives the maximum flexibility in that regard. You can
let fibers run in isolation on different threads, use synchronization
between then, or you can have concurrency without CPU-level
synchronization overhead within the same thread. Especially the latter
can become really interesting with thread-local memory allocators etc.
It also becomes really interesting in situations where
thread-synchronization gets difficult (lock-less structures).
More information about the Digitalmars-d
mailing list