std.parallelism: Final review

Fri Mar 18 19:27:14 PDT 2011

I think your use case is both beyond the scope of std.parallelism and 
better handled by std.concurrency.  std.parallelism is mostly meant to 
handle the pure multicore parallelism use case.  It's not that it 
**can't** handle other use cases, but that's not what it's tuned for.

As far as prioritization, it wouldn't be hard to implement 
prioritization of when a task starts (i.e. have a high- and low-priority 
queue).  However, the whole point of TaskPool is to avoid starting a new 
thread for each task.  Threads are recycled for efficiency.  This 
prevents changing the priority of things in the OS scheduler.  I also 
don't see how to generalize prioritization to map, reduce, parallel 
foreach, etc. w/o making the API much more complex.

In addition, std.parallelism guarantees that tasks will be started in 
the order that they're submitted, except that if the results are needed 
immediately and the task hasn't been started yet, it will be pulled out 
of the middle of the queue and executed immediately.  One way to get the 
prioritization you need is to just submit the tasks in order of 
priority, assuming you're submitting them all from the same place.

One last thing:  As far as I/O goes, AsyncBuf may be useful.  This 
allows you to pipeline reading of a file and higher level processing. 
Example:

// Read the lines of a file into memory in parallel with processing
// them.
import std.stdio, std.parallelism, std.algorithm;

void main() {
     auto lines = map!"a.idup"(File("foo.txt").byLine());
     auto pipelined = taskPool.asyncBuf(lines);

     foreach(line; pipelined) {
         auto ls = line.split("\t");
         auto nums = to!(double[])(ls);
     }
}

On 3/18/2011 9:27 PM, Michel Fortin wrote:
> On 2011-03-18 17:12:07 -0400, Andrei Alexandrescu
> <SeeWebsiteForEmail at erdani.org> said:
>
>> On 3/18/11 3:55 PM, dsimcha wrote:
>>> It's kinda interesting--I don't know at all where this lib stands.
>>> The deafening
>>> silence for the past week makes me think one of two things is true:
>>>
>>> 1. std.parallelism solves a problem that's too niche for 90% of D
>>> users, or
>>>
>>> 2. It's already been through so many rounds of discussion in various
>>> places
>>> (informally with friends, then on the Phobos list, then on this NG)
>>> that there
>>> really is nothing left to nitpick.
>>>
>>> I have no idea which of these is true.
>>
>> Probably a weighted average of the two. If I were to venture a guess
>> I'd ascribe more weight to 1. This is partly because I'm also
>> receiving relatively little feedback on the concurrency chapter in
>> TDPL. Also the general pattern on many such discussion groups is that
>> the amount of traffic on a given topic is inversely correlated with
>> its complexity.
>
> One reason might also be that not many people are invested in D for such
> things right now. It's hard to review such code and make useful comments
> without actually testing it on a problem that would benefit from its use.
>
> If I was writing in D the application I am currently writing, I'd
> certainly give it a try. But the thing I have that would benefit from
> something like this is in Objective-C (it's a Cocoa program I'm
> writing). I'll eventually get D to interact well with Apple's
> Objective-C APIs, but in the meantime all I'm writing in D is some
> simple web stuff which doesn't require multithreading at all.
>
> In my application, what I'm doing is starting hundreds of tasks from the
> main thread, and once those tasks are done they generally send back a
> message to the main thread through Cocoa's event dispatching mechanism.
>  From a quick glance at the documentation, std.parallelism offers what
> I'd need if I were to implement a similar application in D. The only
> thing I don't see is a way to priorize tasks: some of my tasks need a
> more immediate execution than others in order to keep the application
> responsive.
>
> One interesting bit: what I'm doing in those tasks is mostly I/O on the
> hard drive combined with some parsing. I find a task queue is useful to
> manage all the work, in my case it's not really about maximizing the
> utilization of a multicore processor but more about keeping it out of
> the main thread so the application is still responsive. Maximizing speed
> is still a secondary objective, but given most of the work is I/O-bound,
> having multiple cores available doesn't help much.
>