What To Do About Shared?

Wed Mar 23 06:46:04 PDT 2011

On 3/23/2011 9:09 AM, Jason House wrote:
> dsimcha Wrote:
>
>> Some discussions about std.parallelism have prompted an examination of
>> how far D's guarantees against low level data races should extend and
>> how safety and practicality should be balanced.
>
> I didn't follow the review of std.parallelism, can you give some specific examples?
>
> Users of languages look to standard libraries as a model for how to write their own apps. I don't like your proposal and think std.parallelism should use shared properly. I'd like to understand better what your issues with shared were. I've done a descent amount of shared-correct code, so I'm pretty sure it's usable. In fact, the only really nasty bug I had could have been caught if std.thread had been shared-correct...

I have already decided that, unless shared is drastically improved in 
ways I don't foresee (I'm not even sure exactly how, this would need to 
be discussed), I will not be making std.parallelism shared correct.  I 
put some serious thought into this issue before making the proposal and 
concluded that, for even moderately fine-grained parallelism, shared 
would get in the way more than it helps.  shared has its place if you're 
primarily using message passing and using shared state in only a very 
limited number of places, but IMHO that's the only way it helps more 
than it gets in the way.

If it comes down to a choice between the two (I hope I don't have to 
make this choice), I'd rather have std.parallelism be a useful 3rd party 
lib than an unusable bondage-and-discipline Phobos module.  If someone 
else wants to fork it and try to make it shared correct, that's their 
prerogative.

Remember, D is a **SYSTEMS LANGUAGE**.  There is no excuse for it going 
out of its way to make certain paradigms as difficult as possible, or 
not supporting them, just because they're dangerous.  If that's the 
direction we're going in, why don't we rip pointer arithmetic, inline 
ASM, unsafe casts, manual memory management, etc. out of the language 
and call ourselves Java++?  IMHO making shared-correctness mandatory 
unless you fight the type system every inch of the way would be going in 
that direction with regard to concurrency.  core.thread is a low-level 
druntime module.  If you wanted shared-correct multithreading, you 
should have been using std.concurrency.  If std.concurrency wasn't 
enough to get the job done, then that's proof that shared is only useful 
if you're mostly using message passing and occasionally shared state.

Some examples:

// This is my example for parallel foreach.
auto logs = new double[1_000_000];

foreach(i, ref elem; parallel(logs)) {
     elem = log(i + 1);
}

Here you have multiple threads writing to the same array in parallel. 
They're guaranteed never to write to the same element, though, making it 
safe except on some obscure/ancient hardware that we don't care about 
(e.g. old DEC Alphas) that can't write to memory at byte granularity.

Yes, I'm aware of the false sharing issue with writing to adjacent 
addresses from different threads.  This is not a problem for this 
example because these falsely shared writes will be such a small portion 
of all writes that the performance impact is negligible.  Making all 
updates atomic/fenced/whatever shared does would be a huge performance 
hit for no benefit, and would make the code more verbose and type heavy.

// This is a parallel quick sort.  Again, it writes to a data
// structures from multiple threads, but in a way that guarantees no
// element is "owned" by two threads at the same time.
void parallelSort(T)(T[] data) {
     // Sort small subarrays serially.
     if(data.length < 100) {
          std.algorithm.sort(data);
          return;
     }

     // Partition the array.
     swap(data[$ / 2], data[$ - 1]);
     auto pivot = data[$ - 1];
     bool lessThanPivot(T elem) { return elem < pivot; }

     auto greaterEqual = partition!lessThanPivot(data[0..$ - 1]);
     swap(data[$ - greaterEqual.length - 1], data[$ - 1]);

     auto less = data[0..$ - greaterEqual.length - 1];
     greaterEqual = data[$ - greaterEqual.length..$];

     // Execute both recursion branches in parallel.
     auto recurseTask = task!(parallelSort)(greaterEqual);
     taskPool.put(recurseTask);
     parallelSort(less);
     recurseTask.yieldForce();
}

// Read in a file in a background thread and return the results in a
// mutable, non-shared array that the caller can then process further.
import std.file, std.parallelism;

void main() {
     // Create and submit a Task object for reading foo.txt.
     auto file1Task = task(&read, "foo.txt");
     file1Task.executeInNewThread();

     // Read bar.txt in parallel.
     auto file2Data = read("bar.txt");

     // Get the results of reading foo.txt.
     auto file1Data = file1Task.yieldForce();
}