review of std.parallelism

Sat Mar 19 09:03:51 PDT 2011

On 03/19/2011 02:32 AM, dsimcha wrote:
> Ok, thanks again for clarifying **how** the docs could be improved. I've
> implemented the suggestions and generally given the docs a good reading
> over and clean up. The new docs are at:
>
> http://cis.jhu.edu/~dsimcha/d/phobos/std_parallelism.html

* When using "parallelism" as a common noun, prefix is with a '_' so 
ddoc doesn't underline it.

* "Most of this module completely subverts..." Vague characterizations 
("most", "completely", "some") don't belong in a technical 
documentation. (For example there's either subversion going on or there 
isn't.) Also, std.concurrency and std.parallelism address different 
needs so there's little competition between them. Better: "Unless 
explicitly marked as $(D @trusted) or $(D @safe), artifacts in this 
module are not provably memory-safe and cannot be used with SafeD. If 
used as documented, memory safety is guaranteed."

* Speaking of std.concurrency vs. std.parallelism, the first paragraph 
might be something like: "This module implements high-level primitives 
for shared memory SMP parallelism. These include parallel foreach, 
parallel reduce, parallel eager map, pipelining and future/promise 
parallelism primitives. $(D std._parallelism) is best recommended when 
the same operation is to be executed in parallel over different data. 
For communication between arbitrary threads, see $(D std.concurrency)."

* Still no synopsis example that illustrates in a catchy way the most 
attractive artifacts.

* "After creation, Task objects are submitted to a TaskPool for 
execution." I understand it's possible to use Task straight as a 
promise/future, so s/are/may be/. Also it is my understanding that 
TaskPool efficiently adapts the concrete number of CPUs to an arbitrary 
number of tasks. If that's the case, it's worth mentioning.

* "A call to workWait(), yieldWait(), or spinWait() can be used to 
retrive the return value after the function is finished executing." As 
an aside, a spell checking step would be great ("retrieve") - just put 
the text in an editor with spellchecking. I think what this means is: 
"The methods workWait(), yieldWait(), or spinWait() make sure that the 
function finishes execution and then return its result to the initiating 
thread. Each uses a different waiting strategy as detailed below."

* "If a Task has been submitted to a TaskPool instance, is being stored 
in a stack frame, and has not yet finished, the destructor for this 
struct will automatically call yieldWait() so that the task can finish 
and the stack frame can be destroyed safely." At this point in the doc 
the reader doesn't understand that at all because TaskPool has not been 
seen yet. The reader gets worried that she'll be essentially serializing 
the entire process by mistake. Either move this explanation down or 
provide an example.

* "Function results are returned from yieldWait() and friends by ref." 
Someone coming from C++ may be thrown off by this sudden casual use of 
"friends" and think there's a notion of frienship by reference in D. 
Better: "The forcing methods yieldWait(), workWait(), and spinWait() 
return the result by reference."

* Speaking of which, I'd replace "Wait" with "Force". Right now the 
nomenclature is far removed from futures and promises.

* Is done() a property?

* The example that reads two files at the same time should NOT use 
taskPool. It's just one task, why would the pool ever be needed? If you 
also provided an example that reads n files in memory at the same time 
using a pool, that would illustrate nicely why you need it. If a Task 
can't be launched without being put in a pool, there should be a 
possibility to do so. At my work we have a simple function called 
callInNewThread that does what's needed to launch a function in a new 
thread.

* The note below that example gets me thinking: it is an artificial 
limitation to force users of Task to worry about scope and such. One 
should be able to create a Future object (Task I think in your 
terminology), pass it around like a normal value, and ultimately force 
it. This is the case for all other languages that implement futures. I 
suspect the "scope" parameter associated with the delegate a couple of 
definitions below plays a role here, but I think we need to work for 
providing the smoothest interface possible (possibly prompting 
improvements in the language along the way).

* I'm not sure how to interpret the docs for

ReturnType!(F) run(F, Args...)(F fpOrDelegate, ref Args args);

So it's documented but I'm not supposed to care. Why not just remove? 
Surely there must be higher-level examples that clarify that I can use 
delegates etc.

* The examples have code at top-level. That's fine for short snippets 
but not when using import etc. I recommend putting the code inside 
unittests or function bodies for such cases.

* "If you want to escape the Task object from the function in which it 
was created or prefer to heap allocate and automatically submit to the 
pool, see TaskPool.task()." I'm uncomfortable that I need to remember a 
subtle distinction of the same primitive name ("task") depending on 
membership in a TaskPool or not, which is a tenuous relationship to 
remember. How about "scopedTask()" vs. "task()"? Also, it's worth asking 
ourselves what's the relative overhead of closure allocation vs. task 
switching etc. Maybe we get to simplify things a lot at a small cost in 
efficiency.

* As I mentioned, in the definition:

Task!(run,TypeTuple!(F,Args)) task(F, Args...)(F fun, Args args);

I can't tell what "run" is.

* osReportedNcpu - a case of naming convention gone wild... how about 
totalCPUs or reportedCPUs.

* "A goto from inside the parallel foreach loop to a label outside the 
loop will result in undefined behavior." Would this be a bug in dmd?

* I would group these two together:

ParallelForeach!(R) parallel(R)(R range, size_t workUnitSize);
ParallelForeach!(R) parallel(R)(R range);

Just put a /// ditto for the second, use only one code snippet that 
illustrates two foreach loops, one with the explicit parameter and one 
without. In fact swap the two - most of the time people will use the one 
with the default work unit size. At best don't specify 512 because later 
on you may come with better algorithm to determine the optimal block size.

* "workUnitSize: The number of elements to evaluate in a single Task. 
Must be less than or equal to bufSize, and in practice should be a 
fraction of bufSize such that all worker threads can be used." Then why 
not specify a different parameter such a multiplier or something? The 
dependence between bufSize and worUnitSize is a sign that these two 
should be improved. If you have good reasons that the user must have the 
parameters in this form, give an example substantiating that.

* "auto myMax = taskPool.reduce!"a + b * b"(myArr);" This is not 
computing anything meaningful. To compute the sum of squares, you need 
to give the seed 0.0.

* Again: speed of e.g. parallel min/max vs. serial, pi computation etc. 
on a usual machine?

* The join example is fine, but the repetitive code suggests that loops 
should be better:

import std.file;

void main() {
     auto pool = new TaskPool();
     foreach (filename; ["foo.txt", "bar.txt", "baz.txt"]) {
       pool.put(task!read(filename));
     }

     // Call join() to guarantee that all tasks are done running, the worker
     // threads have terminated and that the results of all of the tasks can
     // be accessed without any synchronization primitives.
     pool.join();

     ubyte[][] data; // void[], meh
     // Use spinWait() since the results are guaranteed to have been 
computed
     // and spinWait() is the cheapest of the wait functions.
     foreach (task; pool.howDoIEnumerateTasks()) {
         data ~= task1.spinWait();
     }

Andrei