[dmd-concurrency] Vot de hekk is shared good for, anyway?

Thu Jan 7 22:58:01 PST 2010

Michel Fortin wrote:
> Le 2010-01-07 à 20:28, Walter Bright a écrit :
>
>   
>> Having a per-thread gc is an optimization, not a fundamental feature of the concurrency model. For one thing, it precludes casting data to immutable. For another, it may result in excessive memory consumption as one thread may have a lot of unused data in its pool that is not available for allocation by another thread.
>>     
>
> Both the "per-thread GC + shared GC" model and "the shared GC for everyone" model can be seen as optimizations. The first optimizes for speed, the second optimize for memory usage.
>
> Depending on what you do, it might even make sense to have some threads using the shared GC for everything and other having a thread-local GC to improve speed.
>
> If you want the language to be limited to models where the memory can always be shared between all threads, then that that's fine. It's your prerogative. I'm not so sure it's wise to limit shared semantics to this scenario just to avoid having the shared-immutable combo, but if you're sure that's what you want then I'll stick to it.
>
>   

There's another aspect here. Consider all the problems we have getting 
across the idea of an immutable type. What hope is there for shared? I 
see mass confusion everywhere. Frankly, I see little hope of any but a 
handful of programmers ever being able to grok shared and use it 
correctly for concurrent programs. The notion that one can just slap 
'shared' on a data type and have it work correctly across threads 
without further thought is a pipe dream.

So what to do?

I want to pin the mainstream concurrency on message passing. The message 
passing user never sees shared, never has to deal with locks, never has 
to deal with memory barriers. It just works. Message passing should be a 
robust, scalable solution for most users. I believe the Erlang 
experience validates this. Go and Scala also rely entirely on message 
passing (but they don't have immutable data, so their models are unsafe 
and I predict many rude surprises).

So why bother with shared at all?

Because message passing does not cover all the bases, and D is supposed 
to be a systems programming language. So we need a paradigm for 
synchronization and shared data structures. What shared provides is:

1. A way to identify shared data. This is incredibly important. A lot of 
sharing bugs come about because of inadvertant unrecognized sharing of 
data. This should be pretty much impossible in D. Furthermore, if you do 
have a sharing bug in your code, you look at the 1% of the data tagged 
as shared, rather than every freakin' line of code and every piece of 
data. Half the battle in debugging code is figuring out where to look 
for the problem. Shared pares that problem down to a reasonable size.

2. Shared comes with a collection of static typing rules and guarantees 
that will head off a number of concurrency bugs, such as sequential 
consistency.

I view shared as sort of like the latest electric arc welders which 
automatically adjust the current and wire feed for you. They 
dramatically shorten (but don't eliminate) the learning curve for people 
trying to master the art of welding. D is the only language to even 
attempt this. C++ leaves you completely on your own, Java offers no 
help, Erlang, Scala and Go throw in the towel and won't allow anything 
but message passing.

As for a shared gc vs thread local gc, I just see an awful lot of 
strange irreproducible bugs when someone passes data from one to the 
other. I doubt it's worth it, unless it can be done with compiler 
guarantees, which seem doubtful.