[dmd-concurrency] shared arrays

Thu Jan 14 12:36:45 PST 2010

On Jan 14, 2010, at 12:20 PM, Steve Schveighoffer wrote:

> ----- Original Message ----
> 
>> From: Andrei Alexandrescu <andrei at erdani.com>
>> 
>> Steve Schveighoffer wrote:
>>> or these?
>>> 
>>> b.dup; T t; b ~= t; b ~= a;
>>> 
>>> But really, I don't understand why it's ok to copy i.e. a
>>> shared(char)[] where half of it might have changed, but it's not OK
>>> to copy an array of large types.
>> 
>> The array does not enforce any particular relationship between its members.
> 
> A shared(char)[] does have a relationship between it's members -- its a string!
> 
> example of code that is destined for invalid output:
> 
> void foo(shared(char)[] input, shared(char)[] output)
> {
>   output[] = input[];
> }
> 
> main()
> {
>   shared(char)[] a = cast(shared(char)[])"hello".dup;
>   shared(char)[] b = cast(shared(char)[])"world".dup;
>   shared(char)[] c = new shared(char)[5];
> 
>   auto ta = spawn(&foo, a, b);
>   auto tb = spawn(&foo, b, c);
>   ta.wait(); // unsure of API for this
>   tb.wait();

Hm... I hadn't planned to add a wait() call for stuff exposed by spawn, but I suppose it's a logical extension of watch(tid) (ie. "please notify me when this thread exits"), which we were going to provide.  Adding a wait() wrapper for this would be trivial.

>   writeln(c);
> }
> 
> I'd expect output to match the regex [hw][eo][lr]l[od]

It should be.

> Note also that shared(char)[] elements can be multi-byte code-points!  You could easily generate an invalid string that way (essentially tearing).
> 
> The problem is, the compiler doesn't know with an array of items whether it's the array that must be atomic or the elements that must be atomic, or some other relationship (such as a group of elements are related as in utf-8 code points).  It should either refuse copying any data, or allow copying any data.  Making a decision based on assumptions of the array semantic meaning doesn't seem right to me.

D allows non utf-8 data in a char[], so I don't see any reason for it to try and guarantee any meaningful result from such an operation.  Earlier, I had been thinking it might be nice to have this though:

shared(char)[] a, b;

synchronized( lock( a, b ) ) {
    // some fancy algorithm on a and b
}

Basically, use the hashtable of mutexes discussed earlier to allow users to obtain locks on a set of N arrays in a safe manner (because expecting them to do it manually will generally result in deadlock).  This makes what's happening explicit and allows the whole mess to be handled in library code.  In theory, this same approach could work for any reference type.  The optimization issue would be making gc_query() not need to obtain the GC lock to return a valid result (this may be safe already, I haven't spent the time to figure it out).