review of std.parallelism

Mon Mar 21 05:37:56 PDT 2011

On 2011-03-20 23:21:49 -0400, dsimcha <dsimcha at yahoo.com> said:

> On 3/20/2011 10:44 PM, Michel Fortin wrote:
>> 
>> I don't see a problem with the above. The array elements you modify are
>> passed through parallel's opApply which can check easily whether it's
>> safe or not to pass them by ref to different threads (by checking the
>> element's size) and allow or disallow the operation accordingly.
>> 
>> It could even do a clever trick to make it safe to pass things such as
>> elements of array of bytes by ref (by coalescing loop iterations for all
>> bytes sharing the same word into one task). That might not work for
>> ranges which are not arrays however.
>> 
>> That said, feel free to suggest more problematic examples.
> 
> Ok, I completely agree in principle, though I question whether it's 
> worth actually implementing something like this, especially until we 
> get some kind of support for shared delegates.

Well, it'll work irrespective of whether shared delegates are used or 
not. I think you could add a compile-time check that the array element 
size is a multiple of the word size when the element is passed by ref 
in the loop and leave the clever trick as a possible future 
improvements. Would that work?

>>> Also, your example can be trivially modified to be safe.
>>> 
>>> void main() {
>>> int sum = 0;
>>> foreach (int value; taskPool.parallel([0,2,3,6,1,4,6,3,3,3,6])) {
>>> synchronized sum += value;
>>> }
>>> writeln(sum);
>>> }
>>> 
>>> In this case that kills all parallelism, but in more realistic cases I
>>> use this pattern often. I find it very common to have an expensive
>>> loop body can be performed in parallel, except for a tiny portion that
>>> must update a shared data structure. I'm aware that it might be
>>> possible, in theory, to write this more formally using reduce() or
>>> something. However:
>>> 
>>> 1. If the portion of the loop that deals with shared data is very
>>> small (and therefore the serialization caused by the synchronized
>>> block is negligible), it's often more efficient to only keep one data
>>> structure in memory and update it concurrently, rather than use
>>> stronger isolation between threads like reduce() does, and have to
>>> maintain one data structure for each thread.
>>> 
>>> 2. In my experience synchronizing on a small portion of the loop body
>>> works very well in practice. My general philosophy is that, in a
>>> library like this, dangerous but useful constructs must be supported
>>> and treated as innocent until proven guilty, not the other way round.
>> 
>> Your second example is not really a good justification of anything. I'll
>> refer you to how synchronized classes work. It was decided that
>> synchronized in a class protects everything that is directly stored in
>> the class. Anything behind an indirection is considered shared by the
>> compiler. The implication of this is that if you have an array or a
>> pointer to something that you want semantically to be protected by the
>> class's mutex, you have to cast things to unshared. It was decided that
>> things should be safe against low-level races first, and convenience was
>> relegated as a secondary concern. I don't like it very much, but that's
>> what was decided and written in TDPL.
> 
> I'd go a little further.  If the guarantees that shared was supposed to 
> provide are strong, i.e. apply no matter what threading module is used, 
> then I utterly despise it.  It's one of the worst decisions made in the 
> design of D.  Making things pedantically strict, so that the type 
> system gets in the way more than it helps, encourages the user to 
> reflexively circumvent the type system without thinking hard about 
> doing this, thus defeating its purpose.  (The alternative of always 
> complying with what the type system "expects" you to do is too 
> inflexible to even be worth considering.)  Type systems should err on 
> the side of accepting a superset of what's correct and treating code as 
> innocent until proven guilty, not the other way around.  I still 
> believe this even if some of the bugs it could be letting pass through 
> might be very difficult to debug.  See the discussion we had a few 
> weeks ago about implicit integer casting and porting code to 64.

I agree with you that this is a serious problem. I think part of why it 
hasn't been talked much yet is that nobody is currently using D2 
seriously for multithreaded stuff at this time (apart from you I 
guess), so we're missing experience with it. Andrei seems to think it's 
fine to required casts as soon as you need to protect something beyond 
an indirection inside synchronized classes, with the mitigation measure 
that you can make classes share their mutex (not implemented yet I 
think) so if the indirection leads to a class it is less of a problem. 
Personally, I don't.

> My excuse for std.parallelism is that it's pedal-to-metal parallelism, 
> so it's more acceptable for it to be dangerous than general case 
> concurrency.  IMHO when you use the non- at safe parts of std.parallelism 
> (i.e. most of the library), that's equivalent to casting away shared in 
> a whole bunch of places.  Typing "import std.parallelism;" in a 
> non- at safe module is an explicit enough step here.

I still think this "pedal-to-metal" qualification needs to be 
justified. Not having shared delegates in the language seems like an 
appropriate justification to me. Wanting to bypass casts you normally 
have to do around synchronized as the sole reason seems like a bad 
justification to me.

It's not that I like how synchronized works, it's just that I think it 
should work the same everywhere.

> The guarantee is still preserved that, if you only use std.concurrency 
> (D's flagship "safe" concurrency module) for multithreading and don't 
> cast away shared, there can be no low level data races. IMHO this is 
> still a substantial accomplishment in that there exists a way to do 
> safe, statically checkable concurrency in D, even if it's not the 
> **only** way concurrency can be done.  BTW, core.thread can also be 
> used to get around D's type system, not just std.parallelism.  If you 
> want to check that only safe concurrency is used, importing 
> std.parallelism and core.thread can be grepped just as easily as 
> casting away shared.

Unless I'm mistaken, the only thing that bypasses race-safety in 
core.thread is the Thread constructor that takes a delegate. Which 
means it could easily be made race-safe by making that delegate 
parameter shared (once shared delegates are implemented).

> If, on the other hand, the guarantees of shared are supposed to be weak 
> in that they only apply to programs where only std.concurrency is used 
> for multithreading, then I think strictness is the right thing to do. 
> The whole point of std.concurrency is to give strong guarantees, but if 
> you prefer more dangerous but more flexible multithreading, other 
> paradigms should be readily available.

I think the language as a whole is designed to have strong guaranties, 
otherwise synchronized classes wouldn't require out-of-guaranty casts 
at every indirection.

I'm not too pleased with the way synchronized classes are supposed to 
work, nor am I too pleased with how it impacts the rest of the 
language. But if this is a problem (and I think it is), it ought to be 
fixed globally, not by shutting down safeties in every module dealing 
with multithreading that isn't std.concurrency.

> I'm **still** totally confused about how shared is supposed to work, 
> because I don't have a fully debugged/implemented implementation or 
> good examples of stuff written in this paradigm to play around with.

I think nobody have played much with the paradigm at this point, or 
we'd have heard some feedback. Well, actually we have your feedback 
which seem to indicate that it's better to shut off safeties than to 
play nice with them.

 - - -

Quoting Andrei, February 4 of 2010 in "there is no escape" on the 
dmd-concurrency mailing list:

> As we already knew, shared/synchronized limit quite drastically the 
> range of lock-based designs without casts. Fortunately, using a class 
> reference member inside a synchronized object will be possible 
> because... well I'll explain in the text.
> 
> I continue to believe this is the right bet to make, but I expect push 
> back from experienced lock-based programmers.

Now is the beginning of that push back, I guess.

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/