review of std.parallelism
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Sat Mar 19 09:03:51 PDT 2011
On 03/19/2011 02:32 AM, dsimcha wrote:
> Ok, thanks again for clarifying **how** the docs could be improved. I've
> implemented the suggestions and generally given the docs a good reading
> over and clean up. The new docs are at:
>
> http://cis.jhu.edu/~dsimcha/d/phobos/std_parallelism.html
* When using "parallelism" as a common noun, prefix is with a '_' so
ddoc doesn't underline it.
* "Most of this module completely subverts..." Vague characterizations
("most", "completely", "some") don't belong in a technical
documentation. (For example there's either subversion going on or there
isn't.) Also, std.concurrency and std.parallelism address different
needs so there's little competition between them. Better: "Unless
explicitly marked as $(D @trusted) or $(D @safe), artifacts in this
module are not provably memory-safe and cannot be used with SafeD. If
used as documented, memory safety is guaranteed."
* Speaking of std.concurrency vs. std.parallelism, the first paragraph
might be something like: "This module implements high-level primitives
for shared memory SMP parallelism. These include parallel foreach,
parallel reduce, parallel eager map, pipelining and future/promise
parallelism primitives. $(D std._parallelism) is best recommended when
the same operation is to be executed in parallel over different data.
For communication between arbitrary threads, see $(D std.concurrency)."
* Still no synopsis example that illustrates in a catchy way the most
attractive artifacts.
* "After creation, Task objects are submitted to a TaskPool for
execution." I understand it's possible to use Task straight as a
promise/future, so s/are/may be/. Also it is my understanding that
TaskPool efficiently adapts the concrete number of CPUs to an arbitrary
number of tasks. If that's the case, it's worth mentioning.
* "A call to workWait(), yieldWait(), or spinWait() can be used to
retrive the return value after the function is finished executing." As
an aside, a spell checking step would be great ("retrieve") - just put
the text in an editor with spellchecking. I think what this means is:
"The methods workWait(), yieldWait(), or spinWait() make sure that the
function finishes execution and then return its result to the initiating
thread. Each uses a different waiting strategy as detailed below."
* "If a Task has been submitted to a TaskPool instance, is being stored
in a stack frame, and has not yet finished, the destructor for this
struct will automatically call yieldWait() so that the task can finish
and the stack frame can be destroyed safely." At this point in the doc
the reader doesn't understand that at all because TaskPool has not been
seen yet. The reader gets worried that she'll be essentially serializing
the entire process by mistake. Either move this explanation down or
provide an example.
* "Function results are returned from yieldWait() and friends by ref."
Someone coming from C++ may be thrown off by this sudden casual use of
"friends" and think there's a notion of frienship by reference in D.
Better: "The forcing methods yieldWait(), workWait(), and spinWait()
return the result by reference."
* Speaking of which, I'd replace "Wait" with "Force". Right now the
nomenclature is far removed from futures and promises.
* Is done() a property?
* The example that reads two files at the same time should NOT use
taskPool. It's just one task, why would the pool ever be needed? If you
also provided an example that reads n files in memory at the same time
using a pool, that would illustrate nicely why you need it. If a Task
can't be launched without being put in a pool, there should be a
possibility to do so. At my work we have a simple function called
callInNewThread that does what's needed to launch a function in a new
thread.
* The note below that example gets me thinking: it is an artificial
limitation to force users of Task to worry about scope and such. One
should be able to create a Future object (Task I think in your
terminology), pass it around like a normal value, and ultimately force
it. This is the case for all other languages that implement futures. I
suspect the "scope" parameter associated with the delegate a couple of
definitions below plays a role here, but I think we need to work for
providing the smoothest interface possible (possibly prompting
improvements in the language along the way).
* I'm not sure how to interpret the docs for
ReturnType!(F) run(F, Args...)(F fpOrDelegate, ref Args args);
So it's documented but I'm not supposed to care. Why not just remove?
Surely there must be higher-level examples that clarify that I can use
delegates etc.
* The examples have code at top-level. That's fine for short snippets
but not when using import etc. I recommend putting the code inside
unittests or function bodies for such cases.
* "If you want to escape the Task object from the function in which it
was created or prefer to heap allocate and automatically submit to the
pool, see TaskPool.task()." I'm uncomfortable that I need to remember a
subtle distinction of the same primitive name ("task") depending on
membership in a TaskPool or not, which is a tenuous relationship to
remember. How about "scopedTask()" vs. "task()"? Also, it's worth asking
ourselves what's the relative overhead of closure allocation vs. task
switching etc. Maybe we get to simplify things a lot at a small cost in
efficiency.
* As I mentioned, in the definition:
Task!(run,TypeTuple!(F,Args)) task(F, Args...)(F fun, Args args);
I can't tell what "run" is.
* osReportedNcpu - a case of naming convention gone wild... how about
totalCPUs or reportedCPUs.
* "A goto from inside the parallel foreach loop to a label outside the
loop will result in undefined behavior." Would this be a bug in dmd?
* I would group these two together:
ParallelForeach!(R) parallel(R)(R range, size_t workUnitSize);
ParallelForeach!(R) parallel(R)(R range);
Just put a /// ditto for the second, use only one code snippet that
illustrates two foreach loops, one with the explicit parameter and one
without. In fact swap the two - most of the time people will use the one
with the default work unit size. At best don't specify 512 because later
on you may come with better algorithm to determine the optimal block size.
* "workUnitSize: The number of elements to evaluate in a single Task.
Must be less than or equal to bufSize, and in practice should be a
fraction of bufSize such that all worker threads can be used." Then why
not specify a different parameter such a multiplier or something? The
dependence between bufSize and worUnitSize is a sign that these two
should be improved. If you have good reasons that the user must have the
parameters in this form, give an example substantiating that.
* "auto myMax = taskPool.reduce!"a + b * b"(myArr);" This is not
computing anything meaningful. To compute the sum of squares, you need
to give the seed 0.0.
* Again: speed of e.g. parallel min/max vs. serial, pi computation etc.
on a usual machine?
* The join example is fine, but the repetitive code suggests that loops
should be better:
import std.file;
void main() {
auto pool = new TaskPool();
foreach (filename; ["foo.txt", "bar.txt", "baz.txt"]) {
pool.put(task!read(filename));
}
// Call join() to guarantee that all tasks are done running, the worker
// threads have terminated and that the results of all of the tasks can
// be accessed without any synchronization primitives.
pool.join();
ubyte[][] data; // void[], meh
// Use spinWait() since the results are guaranteed to have been
computed
// and spinWait() is the cheapest of the wait functions.
foreach (task; pool.howDoIEnumerateTasks()) {
data ~= task1.spinWait();
}
Andrei
More information about the Digitalmars-d
mailing list