[dmd-concurrency] Vot de hekk is shared good for, anyway?

Fri Jan 8 08:48:08 PST 2010

Le 2010-01-08 à 1:58, Walter Bright a écrit :

> Michel Fortin wrote:
>> If you want the language to be limited to models where the memory can always be shared between all threads, then that that's fine. It's your prerogative. I'm not so sure it's wise to limit shared semantics to this scenario just to avoid having the shared-immutable combo, but if you're sure that's what you want then I'll stick to it.
> 
> There's another aspect here. Consider all the problems we have getting across the idea of an immutable type. What hope is there for shared? I see mass confusion everywhere. Frankly, I see little hope of any but a handful of programmers ever being able to grok shared and use it correctly for concurrent programs. The notion that one can just slap 'shared' on a data type and have it work correctly across threads without further thought is a pipe dream.
> 
> So what to do?
> 
> I want to pin the mainstream concurrency on message passing. The message passing user never sees shared, never has to deal with locks, never has to deal with memory barriers. It just works. Message passing should be a robust, scalable solution for most users. I believe the Erlang experience validates this. Go and Scala also rely entirely on message passing (but they don't have immutable data, so their models are unsafe and I predict many rude surprises).

I agree that message passing should be the preferred method for concurrency. It's the easiest to understand, and it scales well, even on an internet scale. I also agree that it's much better if shared isn't needed anywhere when using the message passing API. But does this last constrain breaks the idea of thread-local pools? Not at all.

First of all, there will be a mechanism in the message passing system to copy the data when needed, because a pool of memory shared between the sender and the receiver will not always exist (for instance when you're communicating with another computer).

Second, I trust the messaging API will do everything it can to not copy around the memory when not necessary. If the runtime supports changing non-shared to shared -- because there is one global memory pool, or because it can somehow transfer the ownership -- then it won't copy the data: it'll pass the pointer instead. If the data cannot be shared, it'll be because the runtime doesn't offer the feature, or because of other limitations, and the copying mechanism will take on.

So the compiler doesn't need to conflate shared-immutable with immutable to keep the message passing API simple. Whether to copy or to pass the pointer can be decided by the message passing API itself depending on whether the runtime abilities. When you know the runtime doesn't support sharing data previously allocated as not-shared, then it may be more efficient to create your message as a shared object from the start to avoid copying, but 1) that doesn't change anything for the scenario where everything comes from the same memory pool, and 2) if your messages are short you won't care whether they're copied or shared anyway. 

So the messaging API can stay simple even if the language itself does not combine shared-immutable and immutable into one. Using a thread-local GC is just a matter of choosing a different tradeoff, and supporting it doesn't affect performance when its not needed.

> So why bother with shared at all?
> 

[[here goes a lot of things I agree with]]

> As for a shared gc vs thread local gc, I just see an awful lot of strange irreproducible bugs when someone passes data from one to the other. I doubt it's worth it, unless it can be done with compiler guarantees, which seem doubtful.

I agree, and I think it can be done with compiler guaranties, as long as shared-immutable is treated differently from immutable by the compiler (and you don't do stupid casts).

That doesn't mean it'll become hard to use immutable. Here are the rules I propose:

* "shared immutable" implicitly converts to "immutable".
* "immutable" can be made "shared immutable" explicitly with a function template in druntime, if supported by the runtime. Otherwise it needs to be copied using APIs of some sort.
* "immutable" global variables can be promoted to "shared immutable" at compile-time because it doesn't break anything and is more efficient.

With that I'm pretty sure the only code that will have to use "shared immutable" is code that needs to deal with "shared" anyway (like inside the message passing system). Any code using just "immutable" will accept "shared immutable" whenever it likes because of the implicit cast.

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/