review of std.parallelism

Sat Mar 19 11:14:51 PDT 2011

On 2011-03-19 13:20:00 -0400, dsimcha <dsimcha at yahoo.com> said:

> On 3/19/2011 1:09 PM, Michel Fortin wrote:
>> For instance:
>> 
>> void main() {
>> int sum = 0;
>> foreach (int value; taskPool.parallel([0,2,3,6,1,4,6,3,3,3,6])) {
>> sum += value;
>> }
>> writeln(sum);
>> }
>> 
>> The "+=" would need to be an atomic operation to avoid low-level races.
>> 
>> I think that ParallelForeach's opApply should only accept a shared
>> delegate. I define shared delegate as a delegate that does not reference
>> any non-shared variables of its outer scope. The problem is that DMD
>> currently doesn't know how to determine whether a delegate literal is
>> shared or not, thus a delegate literal is never shared and if
>> ParallelForeach's opApply asked a shared delegate as it should it would
>> just not work. Fix DMD to create shared delegate literals where
>> appropriate and everything can be guarantied race-free.
> 
> If you want pedal-to-metal parallelism without insane levels of 
> verbosity, you can't have these safety features.

I'm not sure where my proposal asks for more verbosity or less 
performance. All I can see is a few less casts in std.parallelism and 
that it'd disallow the case in my example above that is totally wrong. 
Either you're interpreting it wrong or there are things I haven't 
thought about (and I'd be happy to know about them).

But by looking at all the examples in the documentation, I cannot find 
one that would need to be changed... well, except the one I'll discuss 
below.

> I thought long and hard about this issue before submitting this lib for 
> review and concluded that any solution would make std.parallelism so 
> slow, so limited and/or such a PITA to use that I'd rather it just punt 
> these issues to the programmer.  In practice, parallel foreach is used 
> with very small, performance-critical snippets that are fairly easy to 
> reason about even without any language-level race safety.

I'm not too convinced about the "I know what I'm doing" argument when I 
look at this example from asyncBuf's documentation:

    auto lines = File("foo.txt").byLine();
    auto duped = map!"a.idup"(lines);  // Necessary b/c byLine() 
recycles buffer

    // Fetch more lines in the background while we process the lines already
    // read into memory into a matrix of doubles.
    double[][] matrix;
    auto asyncReader = taskPool.asyncBuf(duped);

    foreach(line; asyncReader) {
        auto ls = line.split("\t");
        matrix ~= to!(double[])(ls);
    }

Look at the last line of the foreach. You are appending to a non-shared 
array from many different threads. How is that not a race condition?

With my proposal, the compiler would have caught that because opApply 
would want the foreach body to be  a shared delegate, and 
reading/writing to non-shared variable "matrix" in the outer scope from 
a shared delegate literal would be an error.

I'm not too sure how hard it'd be to do that in the compiler, but I 
think it's the right thing to do. Once the compiler can do this we can 
have safety; until that time I'd agree to see std.parallelism stay 
unsafe.

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/