Threads don't scale, processes do - Fibers

Wed Aug 27 09:49:50 PDT 2008

 > Fibers (as well as Java’s green threads) are not an alternative to
 > heavyweight threads. Fibers can’t run in parallel, so they have no
 > way to use multiple processors. They are more of a flow-of-control
 > construct, often used interchangeably with coroutines.

Fibers can be passed between Threads as long as they are not currently 
executing. This may be implementationdepended but generally seems not to 
be the big issue. See tango's Fibers.

I frequently thought about a model where you got an applicationdepended 
Scheduler/Proxy/Supervisor S which interchanges a set of Fibers between 
a set of Threads - controlled on each yield(). It is based on the 
following scenario:

- Create Fibers for all kinds of independed tasks
   (Game scenario: Graphics, Sound, Logic, Network, Peripheral)
- These Fibers don't call() each others but frequently yield()
- These Fibers only get call()ed by S, and so only yield() to S
- N Fibers run on M Threads that are known to S. Lets assume N>M.
- S may interchange Fibers on each yield() call between the
   Threads according to a policy and may suspend Threads if there is too
   few work to be done.

Some extended demands which could be well accomplished by this pattern:

1.) Let S try to keep up to C=(number of CPUs) Threads busy, s.th. these 
don't reach a non-blocking state from the OS's point of view (see 3 and 
4). Fibers can get switched between Threads at a higher frequency than 
the OS's Scheduler. That is determined by the frequency of yield() 
usage. The OS sees noting of that internal switching.

2.) Tie priorities and timing constraints to Fibers. When distributed 
Fibers do yield() with an appropriate frequency, then S may handle the 
constraints according to the given policy. This may also incorporate 
Thread affinity to reduce the impacts of Thread/CPU switching.

3.) Fibers could announce blocking calls to S, so it may transfer them 
to a specific set of Threads which are created only for the purpose of 
getting blocked. This is to avoid impact on the fibers in 1.) and 2.)

   yieldAndAnnounceBlockingCall();
     // S transfers the Fiber to some thread dedicated only for blocking
     // s.th. 1.) and 2.) already can be accomplished.

   blocking_call();
     // OS suspends execution

   yieldAndAnnounceBlockingFinished();
     // S may transfer the Fiber back to the set of Threads for 1.)
     // or may add the current thread to the set of Threads for 1.)

4.) Synchronization of internal ressources by S

   yieldAndLock( someSharedRessource );
     // Scheduler handles this, possibly without OS support

   use( someSharedRessource );

   yieldAndUnlock( someSharedRessource );
     // s.th. another request can get served

The latter could make locking a significantly less expensive operation, 
since it circumvents the necessity to get synchronized by the OS. If the 
Fibers respect the protocol, then S knows exactly when a ressource R is 
(un)used and when there are waiters. It as well does not have to fear 
asynchronous concurrency (Yes, S needs to be synchronized itself). 
Blocked waiting for R does not become a OS supported blocking call but 
rather manifests in the fact that S does not transfer the requesting 
Fiber to a Thread as long as it is in an internal blocking state (OS 
sees nothing of it).

I created this model when thinking about a flexible architecture for 
games where very different kinds of competing constraints/QoS should be 
satisfied like network and peripheral IO (delay constraints) and 
graphics/physics (overall-amount-of-time constraints). The traditional 
way (making as much work as possible available to the OS's scheduler, 
using blocking calls, synchronizing through locks and hoping the OS will 
do an appropriate scheduling) did not seem satisfying to me. Letting an 
internal Scheduler/Proxy/Supervisor S interchange a set of Fibers 
between a set of Threads according to an applicationdepended policy and 
as well making the Fibers announce blocking calls to S and explicitely 
synchronizing through S indeed seemed very promising to me.

If someone knows of similar attempts please let me know.