Asynchronicity in D

Mon Apr 4 18:08:49 PDT 2011

The problem with threads is the context switch not really the stack
size. Threads are not the solution to increase performance. In high
performance systems threads are used for fairness in the
resquest-response pipeline not for performance. Obviously, this fact
is not argued when talking about uni-processor. With the availability
of cheap multi-processor, multi-core and hyper-threading multiple
threads are needed to keep all logical processors busy. In other words
multiple threads are needed to get the most out of the hardware even
if you don't care about fairness.

Now the argument above doesn't take into account implementability.
Most people write sequential multithreaded because it is "easier" (I
personally think it is harder not to violate the invariant in the
presence of concurrency/sharing). Many people feel it is easier to
extend the programmer to understand a sequential shared model than to
do a paradigm switch to an event based model.

On Mon, Apr 4, 2011 at 6:49 PM, Sean Kelly <sean at invisibleduck.org> wrote:
> On Apr 1, 2011, at 6:08 PM, Brad Roberts wrote:
>
>> On Fri, 1 Apr 2011, dsimcha wrote:
>>
>>> On 4/1/2011 7:27 PM, Sean Kelly wrote:
>>>> On Apr 1, 2011, at 2:24 PM, dsimcha wrote:
>>>>
>>>>> == Quote from Brad Roberts (braddr at puremagic.com)'s article
>>>>>> I've got an app that regularly runs with hundreds of thousands of
>>>>>> connections (though most of them are mostly idle).  I haven't seen it
>>>>>> break 1M yet, but the only thing stopping it is file descriptor limits
>>>>>> and
>>>>>> memory.  It runs a very standard 1 thread per cpu model.  Unfortunatly,
>>>>>> not yet in D.
>>>>>>
>>
>> I won't go into the why part, it's not interesting here, and I probably
>> can't talk about it anyway.
>>
>> The simplified view of how: No fibers, just a small number of kernel
>> threads (via pthread).  An epoll thread that queues tasks that are
>> pulled by the 1 per cpu worker threads.  The queue is only as big as the
>> outstanding work to do.  Assuming that the rate of socket events is less
>> than the time it takes to deal with the data, the queue stays empty.
>>
>> It's actually quite a simple architecture at the 50k foot view.  Having
>> recently hired some new people, I've got recent evidence... it doesn't
>> take a lot of time to fully 'get' the network layer of the system.
>> There's other parts that are more complicated, but they're not part of
>> this discussion.
>>
>> A thread per socket would never handle this load.  Even with a 4k stack
>> (which you'd have to be _super_ careful with since C/C++/D does nothing to
>> help you track), you'd be spending 4G of ram on just the stacks.  And
>> that's before you get near the data structures for all the sockets, etc.
>
> I misread your prior post as one thread per socket and was a bit baffled.  Makes a lot more sense now.  Potentially one read event per socket still means a pretty long queue though.
>
> Regarding the stack size... is that much of an issue with 64-bit processes?  Figure a few pages of committed memory per thread plus a large reserved range that shouldn't impact things at all.  Definitely more than the event model, but maybe not tremendously so?