Asynchronicity and more
Fawzi Mohamed
fawzi at gmx.ch
Sat Apr 2 17:24:50 PDT 2011
There are several difficult issues connected with asynchronicity, high
performace networking and connected things.
I had to deal with them developing blip ( http://fawzi.github.com/
blip ).
My goal with it was to have a good basis for my program dchem, and as
consequence is not so optimized in particular for non recursive tasks,
and it is D1, but I think that the issues are generally relevant.
i/o and asynchronicity is a very important aspect and one that will
tend to "pollute" many parts of the library, and introduce
dependencies that are difficult to remove thus those choices have to
be done carefully.
Overview:
========
Threads vs fibers:
-----------------------
* an issue not yet brought up is that thread wire some memory, and so
have an extra cost that fibers don't.
* evaluation strategy of fibers can be chosen by the user, this is
relevant for recursive tasks where each task
spawns other tasks, different strategies (breadth first evaluation
like threads uses a *lot* more resources
than depth first, by having many more tasks concurrently in
evaluation)
Otherwise the relevant points already brought forth by others are:
- context switch of fibers (assuming that memory is active) is much
faster
- context switch are chosen by the user in fibers (cooperative
multitasking), this allows
one to choose the optima point to switch, but a "bad" fibers can
ruin the response time the others.
- d is not stackless (like Go for example), so each fiber needs to
have enough space for the stack
(something that often is not so easy to predict). This makes fiber
still a bit costly if one really needs a lot of them.
64 bit can help here, because hopefully the active part is small,
and it can be kept in RAM, even using a rather
large virtual space. Still as correctly said by Brad for heavily
uniform handling of many tasks manual
management (and using stateless functions as much as possible) can
be much more efficient.
Closures
------------
When possible and for the low level (often used) operations delegates
and functions calls are a better solution than , structs and manual
memory handling for "closures" are a good choice for low level
operations, because one can avoid the heap allocation connected with
the automatic closure.
This approach cannot be avoided in D1, whereas D2 has the very useful
closures, but at low level their cost should be avoided when possible.
About using structs there are subtle issues that I think are connected
with optimization of the compiler (I never really investigated them, I
always changed the code, or resorted to heap allocation.
The main issue is that one would like to optimize as much as possible,
and to do it it normally assumes that the current thread is the only
user of the stack. If you pass stack stored structures to other
threads these assumptions aren't true anymore, so the memory of a
stack allocated struct might be reused even before the function
returns (unless I am mistaken and the ABI forbids it, in this case
tell me).
Async i/o
----------
* almost always i/o is much slower than CPU, so an i/o operation is
bound to make the cpu wait, so one wants to use the wait efficiently.
- A very simple way is to just use blocking i/o, and just have
other threads do other threads.
- async i/o allows overlap of several operations in a single thread.
- for files an even more efficient way to communicate sharing of
the buffer with the kernel (aio_*)
- an important issue is avoiding waste of cpu cycles while waiting,
to achieve this one can collect several waiting operations and use a
single thread to wait on several of them, select, poll and epoll allow
this, and increase the efficiency of several kinds of programs
- libev and libevent are cross platform libraries that can help
having an event based approach, taking care to check a large number of
events and call a user defined callback when they happen in a robust
cross platform way
locks, semaphores
------------
to synchronize between threads locks and semaphores are a standard way
to synchronize.
One has to be careful to mix them with fiber scheduling with locks, as
one can easily deadlock.
Hardware informationy
-----------------------------
Efficient usage of computational resource depends also on being able
to identify the available hardware.
Don did quite some hacking to get useful information out of cpuinfo,
but if one is interested in more complex computers more info would be
nice.
I use hwloc for this purpose, it is cross plattform, can be embedded.
Possible solutions
==============
Having async i/o can be presented as normal synchronous (blocking) i/
o, but this makes sense only if one has several objects waiting, or
uses fibers, and executes other fiber while waiting.
How acceptable it is to rely (and thus introduce a dependency on)
things like libev or hwloc?
For my purposes using them was ok, and they are cross platform and
embeddable, but is it true also for phobos?
Asynchronicity means being able to have work to be executed
concurrently and then resynchronize at a later point.
One can use processes (that also give memory protection), threads, or
fibers to achieve this.
If one uses just threads, then asynchronous i/o makes sense only with
a fully manual (explicit) handling of it, hiding it away will be
equivalent to blocking i/o.
Fibers allow one to hide async io and make it look as blocking, but as
Sean told there are issues with using fibers with D2 TLS.
I kind of dislike the use of TSL for non low level infrastructure
stuff, but that is just me around here it seems.
In blip I choose to go with fiber based switching.
I wrapped libev both at low level and at a higher level, in such a way
than one can use them directly (for maximum performance)
For the sockets I use non blocking calls, and a single "waiting" (io)
thread, but hide them so that they are used just like blocking calls.
An important design decision if using fibers is if one should be able
to have a "naked" thread, or hide the fiber scheduling in each thread.
In blip I went for yes, because it is entirely realized as a normal
library, but that gives some ugly corner cases when one uses a method
that wants to suspend a thread that doesn't have scheduling place.
Building the scheduling into all threads is probably cleaner if one
goes with fibers.
The problem of TSL and fibers remains though, especially if one allows
the migration of fibers from one thread to the other (as I do in blip).
An important design choice in blip was being able to cope with
recursive parallelism (typical of computation tasks), not just with
the (server like) concurrent parallelism that is typical of servers.
I feel that it is important, but is something that might not be seen
as such by others.
To do
====
Now about async io the first step is for sure to expose an
asynchronous API. This doesn't influence or depends on other parts of
the library much.
An important decision if/which external libraries one can rely on.
Making the async API nicer to use, or even use it "behind the scenes"
as I do in blip needs more complex choices on the basic handling of
suspension and synchronization.
Something like that is bound to be used in several parts of phobos so
a careful choice is needed.
This parts are also partially connected with high performance
networking (another GSoC project).
Fawzi
Fawzi
More information about the Digitalmars-d
mailing list