parallelFuture

Thu Oct 22 06:21:52 PDT 2009

== Quote from Tim Matthews (tim.matthews7 at gmail.com)'s article
> dsimcha wrote:
> > I've created an alpha release of parallelFuture, a high-level parallelization
> > library for D2.  Right now, it has a task pool, futures, parallel foreach, and
> > parallel map.
> >
> > Here's the (IMHO) coolest example:
> >
> > auto pool = new ThreadPool();
> >
> > // Assuming we have a function isPrime(), print all
> > // prime numbers from 0 to uint.max, testing for primeness
> > // in parallel.
> > auto myRange = iota(0, uint.max);
> > foreach(num; pool.parallel(myRange)) {
> >     if(isPrime(num)) {
> >         synchronized writeln(num);
> >     }
> > }
> >
> > The interface is there and it seems to work, although it has not been
> > extensively stress tested yet.  Some of the implementation details could
> > admittedly use some cleaning up, and I would appreciate help from some
> > threading gurus on improving my queue (right now it's a naive synchronized
> > singly-linked list) and getting condition mutexes to work properly.  (Right
> > now, I'm using atomic polling followed by sleeping for 1 millisecond in a lot
> > of places.  It's a kludge, but it seems to work reasonably well in practice.)
> >
> > The code is at:
> >
> > http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/parallelFuture.d
> >
> > The docs are at:
> >
> > http://cis.jhu.edu/~dsimcha/parallelFuture.html
> Nice. About the tasks:
> In .Net a worker thread is created for each cpu core. Each worker thread
> has its own local queue of tasks which it initially retrieves from the
> global queue but if the tasks creates new tasks it adds to its local
> queue directly for less contention. When it finishes completing a task
> it takes the next from the back of its local queue to take advantage of
> cache (like a stack). The tasks at the front of the queue can be stolen
> from another worker thread if its local queue and the global queue are
> both empty to maximize cpu usage.
> Also the task's wait for complete is only considered complete if all of
> the tasks it created too are complete (kinda like recursion).
> What is the implementation or plans like for this.

For now, parallelFuture was designed with a single producer, multiple worker
model.  Absolutely no attempt was made to allow for tasks running in the task pool
to themselves submit jobs to the same task pool, because it would have made things
more complicated and I couldn't think of any use cases.  I designed the lib with
the types of use cases I encounter in my work in mind.  (mathy, pure throughput
oriented computing on large, embarrassingly parallel problems.)  If someone comes
up with a compelling use case, though, I'd certainly consider adding such
abilities provided they don't interfere with performance or API simplicity for the
more common cases.

To make this discussion simple, let's define F1 as a future/task submitted by the
main producer thread, and F2 as a task/future submitted by F1.  The queue is (for
now) strictly FIFO, except that if you have a pointer to the Task/Future object
you can steal a job.  When F1 submits F2 to the queue, F2 goes to the back of the
queue like anything else.  This means when F1 waits on F2, it is possible to have
a cyclical dependency (F1 waiting on F2, F2 waiting for a worker thread populated
by F1).  This is mitigated by work stealing (F1 may just steal F2 and do it in its
own thread).

In parallel map and foreach, I should probably document this, but for now it's
undefined behavior for the mapping function or parallel foreach loop body to
submit jobs to the task pool and wait on them, and in practice will likely result
in deadlocks.