std.parallelism: Final review

Michel Fortin michel.fortin at michelf.com
Sat Mar 19 06:37:17 PDT 2011


On 2011-03-18 22:27:14 -0400, dsimcha <dsimcha at yahoo.com> said:

> I think your use case is both beyond the scope of std.parallelism and 
> better handled by std.concurrency.  std.parallelism is mostly meant to 
> handle the pure multicore parallelism use case.  It's not that it 
> **can't** handle other use cases, but that's not what it's tuned for.

I know. But if this gets its way in the standard library, perhaps it 
should aim at reaching a slightly wider audience? Especially since it 
lacks so little to become more general purpose...


> As far as prioritization, it wouldn't be hard to implement 
> prioritization of when a task starts (i.e. have a high- and 
> low-priority queue).  However, the whole point of TaskPool is to avoid 
> starting a new thread for each task.  Threads are recycled for 
> efficiency.  This prevents changing the priority of things in the OS 
> scheduler.  I also don't see how to generalize prioritization to map, 
> reduce, parallel foreach, etc. w/o making the API much more complex.

I was not talking about thread priority, but ordering priority (which 
task gets chosen first). I don't really care about thread priority in 
my application, and I understand that per-task thread priority doesn't 
make much sense. If I needed per-task thread priority I'd simply make 
pools for the various thread priorities and put tasks in the right 
pools.

That said, perhaps I could do exactly that: create two or three pools 
with different thread priorities, put tasks into the right pool and let 
the OS sort out the scheduling. But then the question becomes: how do I 
choose the thread priority of a task pool? I doesn't seem possible from 
the documentation. Perhaps TaskPool's constructor should have a 
parameter for that.


> In addition, std.parallelism guarantees that tasks will be started in 
> the order that they're submitted, except that if the results are needed 
> immediately and the task hasn't been started yet, it will be pulled out 
> of the middle of the queue and executed immediately.  One way to get 
> the prioritization you need is to just submit the tasks in order of 
> priority, assuming you're submitting them all from the same place.

Most of my tasks are background tasks that just need to be done 
eventually while others are user-requested tasks which can be requested 
at any time in the main thread. Issuing them serially is not really an 
option.


> One last thing:  As far as I/O goes, AsyncBuf may be useful.  This 
> allows you to pipeline reading of a file and higher level processing. 
> Example:
> 
> // Read the lines of a file into memory in parallel with processing
> // them.
> import std.stdio, std.parallelism, std.algorithm;
> 
> void main() {
>      auto lines = map!"a.idup"(File("foo.txt").byLine());
>      auto pipelined = taskPool.asyncBuf(lines);
> 
>      foreach(line; pipelined) {
>          auto ls = line.split("\t");
>          auto nums = to!(double[])(ls);
>      }
> }

Looks nice, but doesn't really work for what I'm doing. Currently I 
have one task per file, each task reading a relatively small file and 
then parsing its content.

 - - -

Another remarks: in the documentation for the TaskPool constructor, it says:

""Default constructor that initializes a TaskPool with one worker 
thread for each CPU reported available by the OS, minus 1 because the 
thread that initialized the pool will also do work.""

This "minus 1" thing doesn't really work for me. It certainly make 
sense for a parallel foreach use case -- whenever the current thread 
would block until the work is done you can use that thread to work too 
-- but in my use case I delegate all the work to other threads because 
my main thread isn't a dedicated working thread and it must not block. 
I'd be nice to have a boolean parameter for the constructor to choose 
if the main thread will work or not (and whether it should do minus 1 
or not).

For the global taskPool, I guess I would just have to write 
"defaultPoolThreads = defaultPoolThreads+1" at the start of the program 
if the main thread isn't going to be working.


-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/



More information about the Digitalmars-d mailing list