Lets talk about fibers

Wed Jun 3 11:34:33 PDT 2015

Hi,

We discussed (not) moving fibers between threads on DConf last 
week, and later it was discussed in the announce group, I think 
this matter is important enough to get a thread of it's own.

Software fibers/coroutines were created to make asynchronous 
programming using a Reactor (or another "event loop i/o 
scheduler") more seamless.

For those unaware of the Reactor Pattern, I advise reading [ 
http://en.wikipedia.org/wiki/Reactor_pattern ; 
http://www.dre.vanderbilt.edu/~schmidt/PDF/reactor-siemens.pdf ], 
and for some perspective at how other languages have addressed 
this I recommend watching Guido Van Rossum's talk about acyncio 
and Python: https://www.youtube.com/watch?v=aurOB4qYuFM

The Reactor pattern is a long-time widely accepted way to achieve 
low latency async io operations, that fortunately became famous 
thanks to the Web and the C10k requirement/problem. Using the 
Reactor is the most efficient way to leverage current CPU 
architectures to perform lots of IO for many reasons outside of 
this scope.
Another very important quality to using a rector based approach, 
is that since all event handlers just serialize on a single IO 
scheduler ("the reactor") on each thread, if designed correctly 
programmers don't have to think about concurrency and care about 
code-races.

Another thing to note: when using the reactor pattern you have to 
make sure that no event handler blocks at all, never! Once an 
event-handler blocks, since being a non-preemptive model, the 
other event handlers will not be able to run, basically starving 
themselves and the clients on the other side of the network.
Reactor implementations usually detect, and notify when an event 
handler took too much time until giving away control (this is 
dependent on application, but should be in the usec range on 
current hw).

The downside for the reactor pattern (used to be) that the 
programmer has to manually keep the state/context of how the 
event handler worked. Since each "logical" operation was 
comprised by many i/o transactions (some NW protocol to keep 
track, maybe accessing a networked DB for some data, 
reading/writing to local/remote files/ etc) the reactor would 
also keep a context for each callback and IO event and the 
programmer had to either update the context and keep registering 
new event handlers manually for all extra I/O transactions and in 
many cases change callback registration in some cases.
This downside means that it's more difficult to program for a 
Reactor model, but since programmers don't have to think about 
races and concurrency issues (and then debug them...) from our 
experience it still more efficient to program than 
general-purpose threads if you care about correctness/coherency.
One way so mitigate this complexity was through the Proactor 
pattern -- implementing higher-level async. IO services over the 
reactor, thus sparing the programmer a lot of the low-level 
context headaches.

Up until now I did not say anything about Fibers/coroutines.

What Fibers bring to the table, is the ability to program within 
the reactor model without having to manually keep a context that 
is separate for the program logic, and without the requirement to 
manually re/register callbacks for different IO events.
D's Fibers allowed us to create an async io library with support 
for network/file/disk operations and higher level conditions 
(waiters, barriers, etc) that allows the programmer to write code 
as-if it runs in its own thread (almost, sometimes fibers are 
explicitly "spawned" -- added to the reactor, and 
fiber-conditions are slightly different than spawning and joining 
threads) without paying the huge correctness/coherence and 
performance penalties of the threading model.

There are two main reasons why it does not make sense to move 
fibers between threads:

1. You'll start having concurrency issues. Lets assume we have a 
main fiber that received some request, and it spawns 3 fibers 
looking into different DBs to get some info and update an array 
with the data. The array will probably be on the stack of the 
first fiber. If fibers don't move between threads, there is 
nothing to worry about (as expected by the model). If you start 
moving fibers across threads you have to start guarding this 
array now, to make sure it's still coherent.
This is a simple example, but basically shows that you're 
"losing" one of the biggest selling point of the whole reactor 
based model.

2. Fibers and reactor based IO make work well (read: make sense) 
when you have a situation where you have lots of concurrent very 
small transactions (similar to the Web C10k problem or a storage 
machine). In this case, if one of the threads has more capacity 
than the rest, then the IO scheduler ("reactor") will just make 
sure to spawn new fibers accepting new transactions in that 
fiber. If you don't have a situation that balancing can be done 
via placing new requests in the right place, then probably you 
should not use the reactor model, but a different one that suits 
your application better.
Currently we can spawn another reactor to take more load, but the 
load is balanced statically at a system-wide level. On previous 
projects we had several reactors running on different threads and 
providing very different functionality (with different handlers, 
naturally).
We never got to a situation that moving a fiber between threads 
made any sense.

As we see, there is nothing to gain and lots to lose by moving 
fibers between threads.

Now, if we want to make sure fibers are well supported in D there 
are several other things we should do:

1. Implement a good asyncIO library that supports fiber based 
programming. I don't know Vibe.d very well (e.g. at all), maybe 
we (Weka.IO) can help review it and suggest ways to make it into 
a general async IO library (we have over 15 years experience 
developing with the reactor model in many environments)

2. Adding better compiler support. The one problem with fibers is 
that upon creation you have to know the stack size for that 
fiber. Different functions will create different stack depths. It 
is very convenient to use the stack to hold all objects (recall 
Walter's first day talk, for example), and it can be used as very 
convenient way to "garbage collect" all resources added during 
the run of that fiber, but currently we don't leverage it to the 
max since we don't have a good way to know/limit the amount of 
memory used this way.
If the compiler will be able to analyze stack usage by functions 
(recursively) and be able to give us hints regarding the 
upper-bounds of stack usage, we will be able to use the stack 
more aggressively and utilize memory much better.
Also -- I think such static analysis will be a big selling point 
for D for systems like ours.

I think now everything is written down, and we can move the 
discussion here.

Liran.