the Disruptor framework vs The Complexities of Concurrency

Fri Dec 7 07:55:50 PST 2012

12/7/2012 1:43 PM, deadalnix пишет:
> On Friday, 7 December 2012 at 09:03:58 UTC, Dejan Lekic wrote:
>> On Friday, 7 December 2012 at 09:00:48 UTC, Nick B wrote:
>>>
>>>> [Andrei's comment ] Cross-pollination is a good thing indeed.
>>>
>>> I came across this while searching the programme of the conference
>>> that Walter is attending in Australia.
>>>
>>>
>>> This gentleman, Martin Thompson
>>>
>>> http://www.yowconference.com.au/general/details.html?speakerId=2962
>>>
>>>
>>> The main idea, is in this paper (11 pages, pdf):
>>>
>>> http://disruptor.googlecode.com/files/Disruptor-1.0.pdf
>>>
>>>
>>> and here is a review of the architecure by Martin Fowler:
>>>
>>> http://martinfowler.com/articles/lmax.html
>>>
>>>
>>>
>>> My questions are:
>>>
>>> 1.  Is this pattern worth porting to the standard library ?
>>>
>>> 2.  Is this pattern a replacement for the 'Complexity of Concurrency' ?
>>>
>>>
>>> cheers
>>> Nick B
>>
>> You can find few presentation from LMAX guys on InfoQ. Pretty neat
>> stuff if you ask me. I think someone already worked on Disruptor-like
>> project in D.
>> I would not like to have the whole thing in Phobos, just some "basic
>> blocks".
>
> Actually, this is what should be behind message passing in phobos IMO.
> Much better than what we currently have.
>
> I wanted to do that at the time, but it wasn't possible without asm.
> You'll find post about that if you dig deep engough in that newsgroup.

The stuff is an amazingly cool pattern.

TI just think there are some limitations of applicability though.

Producer has to read all of sequence counters of its final consumers so 
as to not to stomp on their slots. And indeed 1P-3C is the least shiny 
column. I suspect if one ups the number of "sink"-kind consumers  it 
starts to give in.

Thus I think it's not fit to arbitrary graphs of actors that are 
constructed & destroyed at run-time like in std.concurrency. Keep in 
mind that in model we currently have relations between actors are not 
fixed anywhere and more importantly not every message have to be seen by 
every consumer even eventually.

And it's not a producer-_consumers_ either. I mean that 'consumer' 
implies it consumes the item and that other consumers won't see it.

Looking at the graphs in:

http://martinfowler.com/articles/lmax.html

It's obvious what the thing truly models - a hierarchical pipeline where 
each direct "consumer" visits each of the requests once. Then dependent 
consumers (further down the hierarchy) can visit it and so forth till 
the bottom. The last ones are the "sink" that effectively consumes item. 
The terms multi-pass and multi-stage is more to the point here.

To compare these two different views:

Producer-consumer is a kind of a workflow where worker grubs material 
from input "gate" and does everything on it alone then passes it to the 
next "gate". The scalability is achieved by having a lot of workers that 
each does the whole work independently (and able to do so!).

The advantage is that you never have a half-baked work unit: it either 
waits, in process or done. There is no stage of "semi-done". Also it's 
trivially balanced:
  - queue accumulates too fast - grow consumers pool, shrink producer pool
- queue is near empty - grow more producers, shrink consumers pool

So there is a world of tuning but the levers are easy to understand.
In a sense this scheme scales nicely with the amount of messages just 
not with the number of *independent* *parallel* processing steps per 
each message.

The pipeline described in Disrupter is more like a factory pipeline 
where you have a large rotating transporter and workers are standing 
around it. It's just that in this case it's workers moving around it 
instead of central rotation (and its better as speeds of different 
stages goes up and down).

Then there is one (or many) that puts material and goes forward, other 
workers are doing their own passes and advance until they have to wait 
on the producer (or the worker of the previous stage).

Indeed emulating the second concept with queues that can only take/put 
is embarrassing as you'd have to split queues on each stage and put 
copies of references to items into them (in Java it's always the case).

So the last problem is I don't see how it cleanly scales with the number 
of messages: there is only one instance of a specific consumer type on 
each stage. How do these get scaled if one core working on each is not 
enough?

-- 
Dmitry Olshansky