review of std.parallelism

Sat Mar 19 14:33:05 PDT 2011

On 2011-03-19 15:36:24 -0400, dsimcha <dsimcha at yahoo.com> said:

> On 3/19/2011 2:35 PM, Michel Fortin wrote:
>> On 2011-03-19 13:16:02 -0400, dsimcha <dsimcha at yahoo.com> said:
>> 
>>>> * "A goto from inside the parallel foreach loop to a label outside the
>>>> loop will result in undefined behavior." Would this be a bug in dmd?
>>> 
>>> No, it's because a goto of this form has no reasonable, useful
>>> semantics. I should probably mention in the docs that the same applies
>>> to labeled break and continue.
>>> 
>>> I have no idea what semantics these should have, and even if I did,
>>> given the long odds that even one person would actually need them, I
>>> think they'd be more trouble than they're worth to implement. For
>>> example, once you break out of a parallel foreach loop to some
>>> arbitrary address (and different threads can goto different labels,
>>> etc.), well, it's no longer a parallel foreach loop. It's just a bunch
>>> of completely unstructured threading doing god-knows-what.
>>> 
>>> Therefore, I slapped undefined behavior on it as a big sign that says,
>>> "Just don't do it." This also has the advantage that, if anyone ever
>>> thinks of any good, clearly useful semantics, these will be
>>> implementable without breaking code later.
>> 
>> I think an improvement over undefined behaviour would be to throw an
>> exception.
> 
> The only problem is that there's no easy, well-documented way to tell 
> from the return value of opApply whether it was a break, a goto, a 
> labeled break/continue, etc.  This would be implementable only if I 
> changed the semantics of break to also throw.  This might not be a bad 
> thing (IMHO any type of breaking out of a parallel foreach loop is just 
> silly) but others had asked for different semantics for break.

It's not that silly.

Essentially, what you'd express like this with a normal function taking 
a delegate:

	taskPool.apply([1,2,3], (int i) {
		if (i == 1)
			return;
		// do some things
	});

you'd express like this in a parallel foreach:

	foreach (int i; parallel([1,2,3])) {
		if (i == 1)
			break;
		// do some things
	}

It's not following the semantics of break within a foreach, but it's 
still useful to be able to return early from a function (from a loop 
iteration in this case), so I see the use case for making 'break' do 
what it does.

My only gripe is that the semantics are distorted. In fact, just making 
foreach parallel distorts its semantics. I was confused earlier about a 
foreach being parallel when it was not, someone could also be confused 
in the other direction, thinking foreach is the normal foreach when it 
actually is parallel. This makes code harder to review. Don't consider 
only my opinion on this, but in my opinion the first form above 
(taskPool.apply) is preferable because you absolutely can't mistake it 
with a normal foreach. And I think it's especially important to make it 
stand out given that the compiler can't check for low-level races.

>> Also, what happens if one of the tasks throws an exception?
> 
> It gets rethrown when yieldWait()/spinWait()/workWait() is called.  In 
> the case of the higher-level primitives, it gets re-thrown to the 
> calling thread at some non-deterministic point in the execution of 
> these functions.  I didn't see the need to document this explicitly 
> because it "just works".

That's indeed what I'd expect. But I think it'd still be worth 
mentioning in a short sentense in yeildWait()/spinWait()/workWait()'s 
documentation. It's comforting when the documentation confirms your 
expectations.

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/