review of std.parallelism

Fri Mar 18 21:40:08 PDT 2011

Thanks for the advice.  You mentioned in the past that the documentation 
was inadequate but didn't give enough specifics as to how until now.  As 
the author of the library, things seem obvious to me that don't seem 
obvious to anyone else, so I don't feel that I'm in a good position to 
judge the quality of the documentation and where it needs improvement. 
I plan to fix most of the issues you raised, but I've left comments for 
the few that I can't/won't fix or believe are based on misunderstandings 
below.

On 3/18/2011 11:29 PM, Andrei Alexandrescu wrote:
> 1. Library proper:
>
> * "In the case of non-random access ranges, parallel foreach is still
> usable but buffers lazily to an array..." Wouldn't strided processing
> help? If e.g. 4 threads the first works on 0, 4, 8, ... second works on
> 1, 5, 9, ... and so on.

You can have this if you want, by setting the work unit size to 1. 
Setting it to a larger size just causes more elements to be buffered, 
which may be more efficient in some cases.

>
> * I'm unclear on the tactics used by lazyMap. I'm thinking the obvious
> method should be better: just use one circular buffer. The presence of
> two dependent parameters makes this abstraction difficult to operate with.
>
> * Same question about asyncBuf. What is wrong with a circular buffer
> filled on one side by threads and on the consumed from the other by the
> client? I can think of a couple of answers but it would be great if they
> were part of the documentation.

Are you really suggesting I give detailed rationales for implementation 
decisions in the documentation?  Anyhow, the two reasons for this choice 
are to avoid needing synchronization/atomic ops/etc. on every write to 
the buffer (which we would need since it can be read and written 
concurrently and we need to track whether we have space to write to) and 
because parallel map works best when it operates on relatively large 
buffers, resulting in minimal synchronization overhead per element. 
(Under the hood, the buffer is filled and then eager parallel map is 
called.)

> * Why not make workerIndex a ulong and be done with it?

I doubt anyone's really going to create anywhere near 4 billion TaskPool 
threads over the lifetime of a program.  Part of the point of TaskPool 
is recycling threads rather than paying the overhead of creating and 
destroying them.  Using a ulong on a 32-bit architecture would make 
worker-local storage substantially slower.  workerIndex is how 
worker-local storage works under the hood, so it needs to be fast.

 > * No example for workerIndex and why it's useful.

It should just be private.  The fact that it's public is an artifact of 
when I was designing worker-local storage and didn't know how it was 
going to work yet.  I never thought to revisit this until now.  It 
really isn't useful to client code.

> * Is stop() really trusted or just unsafe? If it's forcibly killing
> threads then its unsafe.

It's not forcibly killing threads.  As the documentation states, it has 
no effect on jobs already executing, only ones in the queue. 
Furthermore, it's needed unless makeDaemon is called.  Speaking of 
which, makeDaemon and makeAngel should probably be trusted, too.

> * defaultPoolThreads - should it be a @property?

Yes.  In spirit it's a global variable.  It requires some extra 
machinations, though, to be threadsafe, which is why it's not 
implemented as a simple global variable.

> * No example for task().

???? Yes there is, for both flavors, though these could admittedly be 
improved.  Only the safe version doesn't have an example, and this is 
just a more restricted version of the function pointer case, so it seems 
silly to make a separate example for it.

> * What is 'run' in the definition of safe task()?

It's just the run() adapter function.  Isn't that obvious?