std.concurrency and fibers

Thu Oct 4 21:27:02 PDT 2012

On 04-10-2012 22:04, Dmitry Olshansky wrote:
> On 04-Oct-12 15:32, Alex Rønne Petersen wrote:
>> Hi,
>>
>> We currently have std.concurrency as a message-passing mechanism. We
>> encourage people to use it instead of OS threads, which is great.
>> However, what is *not* great is that spawned tasks correspond 1:1 to OS
>> threads. This is not even remotely scalable for Erlang-style
>> concurrency. There's a fairly simple way to fix that: Fibers.
>>
>> The only problem with adding fiber support to std.concurrency is that
>> the interface is just not flexible enough. The current interface is
>> completely and entirely tied to the notion of threads (contrary to what
>> its module description says).
>>
>> Now, I see a number of ways we can fix this:
>>
>> A) We completely get rid of the notion of threads and instead simply
>> speak of 'tasks'. This trivially allows us to use threads, fibers,
>> whatever to back the module. I personally think this is the best way to
>> build a message-passing abstraction because it gives enough transparency
>> to *actually* distribute tasks across machines without things breaking.
>
> Cool, but currently it's a leaky abstraction. For instance if task is
> implemented with fibers static variables will be shared among threads.
> Essentially I think Fibers need TLS (or rather FLS) synced with language
> 'static' keyword. Otherwise the whole TLS by default is a useless chunk
> of machinery.

Yeah, it's a problem all right. But we'll need compiler support for this 
stuff in any case.

Can't help but wonder if it's really worth it. It seems to me like a 
simple AA-like API based on the typeid of data would be better -- as in, 
much more generic -- than trying to teach the compiler and runtime how 
to deal with this stuff.

Think something like this:

struct Data
{
     int foo;
     float bar;
}

void myTask()
{
     auto data = Data(42, 42.42f);

     TaskStore.save(data);

     // work ...

     foo();

     // work ...
}

void foo()
{
     auto data = TaskStore.load!Data();

     // work ...
}

I admit, not as seamless as static variables, but a hell of a lot less 
magical.

>
>> B) We make the module capable of backing tasks with both  threads and
>> fibers, and expose an interface that allows the user to choose what kind
>> of task is spawned. I'm *not* convinced this is a good approach because
>> it's extremely error-prone (imagine doing a thread-based receive inside
>> a fiber-based task!).
> Bleh.
>
>> C) We just swap out threads with fibers and document that the module
>> uses fibers. See my comments in A for why I'm not sure this is a good
>> idea.
> Seems a lot like A but with task defined to be a fiber. I'd prefer this.
> However then it needs a user-defined policy for distributing fibers
> across real threads (pools). Btw A is full of this too.

By choosing C we effectively give up any hope of distributed tasks and 
especially if we have a scheduler API. Is that really a good idea in 
this day and age?

>
>> All of these are going to break code in one way or another - that's
>> unavoidable. But we really need to make std.concurrency grow up; other
>> languages (Erlang, Rust, Go, ...) have had micro-threads (in some form)
>> for years, and if we want D to be seriously usable for large-scale
>> concurrency, we need to have them too.
>>
>> Thoughts? Other ideas?
>>
> +1
>

-- 
Alex Rønne Petersen
alex at lycus.org
http://lycus.org