[phobos] Parallelism in Phobos

David Simcha dsimcha at gmail.com
Fri Sep 10 18:01:20 PDT 2010


  On 9/10/2010 8:05 PM, Michel Fortin wrote:
> Le 2010-09-10 à 17:13, David Simcha a écrit :
>
>> As far as I can tell, your needs might be better served by std.concurrency.
> > From what I can see, your parallel foreach is basically some syntactic sugar for queuing tasks inside a loop and then to block until the result is ready. While I'll admit I'm not sure I need that sugar or to block waiting for the result, queuing tasks in a loop is certainly something I need.
It's slightly more complicated than that under the hood because:

1.  If your range has a huge amount of stuff, you want to lazily add it 
to the queue, not add it all upfront.  Parallel foreach does some magic 
under the hood so that you can parallel foreach over a range of size N 
in O(1) memory even if you want small work units.  Modulo the workaround 
for a Linux-specific compiler bug, parallel foreach doesn't even heap 
allocate.

2.  The parallel foreach works with non-random access ranges by buffer 
data for small work units in an array.

> With my app I can easily have 1000 of these tasks queued at a given time (I effectively have a couple of loops that can add tasks to a queue). They mostly read and parse files to extract some pieces of data. At the API level, std.concurency looks like it could do that, except it'd be creating one thread for each task. I don't want to create one thread for each task, so I need some sort of task queue and a thread pool.
>
> But maybe you're right, and maybe the thread pool should go in std.concurrency where creating and queuing a task could work like spawning a thread, perhaps like this:
>
> 	// send task to a specific thread to be executed there
> 	tid.perform(&taskFunc, "hello world");
>
> 	// queue task for execution in a thread pool
> 	tpool.dispatch(&taskFunc, "hello world");
>
> Those two things I'd find quite useful. And it'd be pretty much trivial to build a parallel foreach on top of this.

This is getting me thinking.  I've given up making most of 
std.parallelism safe.  Parallel foreach is the hardest thing to make 
safe, and for me personally the most useful part of std.parallelism.  I 
wonder, though, if I can make Task @safe/@trusted provided:

1.  The input args are either indirection-free, immutable, or shared.

2.  The callable is a function pointer, not a delegate, alias or class 
with overloaded opCall.

3.  The return type is either indirection-free, immutable or shared.  
(This is, unfortunately, necessary b/c the worker thread could in theory 
hold onto a reference to it in TLS after returning, even though doing so 
would be thoroughly idiotic in most cases.)

I'm thinking I may add a safeTask() function that is marked @trusted, 
and creates a Task object iff these constraints are satisfied (and 
otherwise doesn't compile).  I think the only sane way to do this is to 
have a separate safe function for creating tasks in addition to the more 
lenient "here be dragons" one.  The only major thing I don't like about 
this is the idea of sprinkling a few safe functions in a mostly "here be 
dragons" module.  It seems like it would complicate code reviews.

> And just to add weight to the argument that task based concurrency is used pretty much everywhere: I worked before on some industrial software that had this too. It basically had to perform some analysis every time new data came in, in real-time. A new task was created for each piece of data and dispatched to a thread pool, then a few seconds later the result was sent to another thread that'd take some action based on the analysis.

Glad to hear that this might be useful outside scientific computing.


More information about the phobos mailing list