Asynchronicity in D

Fri Apr 1 18:08:13 PDT 2011

On Fri, 1 Apr 2011, dsimcha wrote:

> On 4/1/2011 7:27 PM, Sean Kelly wrote:
> > On Apr 1, 2011, at 2:24 PM, dsimcha wrote:
> > 
> > > == Quote from Brad Roberts (braddr at puremagic.com)'s article
> > > > I've got an app that regularly runs with hundreds of thousands of
> > > > connections (though most of them are mostly idle).  I haven't seen it
> > > > break 1M yet, but the only thing stopping it is file descriptor limits
> > > > and
> > > > memory.  It runs a very standard 1 thread per cpu model.  Unfortunatly,
> > > > not yet in D.
> > > > Later,
> > > > Brad
> > > 
> > > Why/how do you have all these connections open concurrently with only a
> > > few
> > > threads?  Fibers?  A huge asynchronous message queue to deal with new
> > > requests
> > > from connections that aren't idle?
> > 
> > A huge asynchronous message queue.  State is handled either explicitly or
> > implicitly via fibers.  After reading Brad's statement, I'd be interested in
> > seeing a comparison of the memory and performance differences of a thread
> > per socket vs. asynchronous model though (assuming that sockets don't need
> > to interact, so no need for synchronization).
> 
> From the discussions lately I'm thoroughly surprised just how specialized a
> field massively concurrent server programming apparently is.  Since it's so
> far from the type of programming I do my naive opinion was that it wouldn't
> take a Real Programmer from another specialty (though I emphasize Real
> Programmer, not code monkey) long to get up to speed.

I won't go into the why part, it's not interesting here, and I probably 
can't talk about it anyway.

The simplified view of how: No fibers, just a small number of kernel 
threads (via pthread).  An epoll thread that queues tasks that are 
pulled by the 1 per cpu worker threads.  The queue is only as big as the 
outstanding work to do.  Assuming that the rate of socket events is less 
than the time it takes to deal with the data, the queue stays empty.

It's actually quite a simple architecture at the 50k foot view.  Having 
recently hired some new people, I've got recent evidence... it doesn't 
take a lot of time to fully 'get' the network layer of the system.  
There's other parts that are more complicated, but they're not part of 
this discussion.

A thread per socket would never handle this load.  Even with a 4k stack 
(which you'd have to be _super_ careful with since C/C++/D does nothing to 
help you track), you'd be spending 4G of ram on just the stacks.  And 
that's before you get near the data structures for all the sockets, etc.

Later,
Brad