std.parallelism changes done

Thu Mar 24 21:14:26 PDT 2011

On 3/24/2011 10:21 PM, Sönke Ludwig wrote:
>>> Indeed this pattern solves the problem to wait for the completion of a
>>> specific task. It also avoids a huge potential of deadlocks that a
>>> general yield() that does not take a task would have. However, it will
>>> not solve the general problem of one task waiting for another, which
>>> could be in terms of a condition variable or just a mutex that is used
>>> in the middle of the task execution.
>>>
>>
>> Can you elaborate and/or provide an example of the "general" problem?
>> I'm not quite sure what you're getting at.
>
> I have one very specific constellation that I can only sketch.. Suppose
> you have some kind of complex computation going on in the ThreadPool.
> This computation is done by a large set of tasks where each tasks
> depends on the result of one or more other tasks. One task is
> responsible for coordinating the work - it is spawning tasks and waiting
> for their completion to spawn new tasks for which the results are now
> available.
>

As I've said before in related discussions, you are _probably_ better 
off using one of the high level primitives instead of using tasks 
directly in these cases.  If not, I'd prefer to improve the higher level 
primitives and/or create new ones if possible.  (Feel free to suggest 
one if you can think of it.)  Tasks are, IMHO, too low level for 
anything except basic future/promise parallelism and implementing higher 
level primitives.  Incidentally the situation you describe (a 
coordinator task creating lots of worker tasks) is exactly how amap(), 
reduce() and parallel foreach work under the hood.  This complexity is 
completely encapsulated, though.

> First thing here is that you do not want to do the waitForce() kind of
> waiting in the coordinator task because this might cause the coordinator
> to be busy with an expensive taks while it could already spawn new tasks
> because maybe in the meantime some other tasks have already finished.

I assume you mean yieldForce().

>
> However, if you wait for a condition variable instead (which is fired
> after each finished task) and if you can have multiple computations of
> this kind running in parallel, you can immediately run into the
> situation that the thread pool is crowded with coordinator tasks that
> are all waiting for their condition variables which will never be
> triggered because no worker tasks can be executed.

I assume you're talking about a condition variable other than the one 
yieldForce() uses.  As mentioned previously, in the specific case of 
yieldForce() this is a solved problem.  In the general case I can see 
the problem.

>
> This is only one example and basically this problem can arise in all
> cases where one task depends on another task by some form of waiting
> that will not execute the dependency like waitForce() does.

Hmm, ok, I definitely understand the problem now.

>>> But what I wanted to say is, even if it may be difficult to implement
>>> such thread caching now, putting means to execute a Task in its own
>>> thread now into the ThreadPool allows for such an optimization later (it
>>> could even exist while still keeping Task.executeInNewThread()).
>>
>> I can't really comment because I still don't understand this very well.
>
> I hope I could make it a little more clear what I mean. The problem is
> just that the system I'm talking about is quite complex and it's not
> easy to find good and simple examples in that system. The problems of
> course arise only in the most complex pathes of execution..
>
> What I'm not sure about is if executeInNewThread() is supposed to be
> useful just because it is somtimes nice to have the fine-grained
> parallelism of the OS scheduler as opposed to task granilarity, or if
> the advantage is supposed to be efficiency gained because the thread
> pool is not created. In the latter case the caching of some threads to
> be reused for a executeInOwnThread()-method should lead to a better
> performance in almost any case where thread creation overhead is relevant.

Ok, now I'm starting to understand this.  Please correct me (once you've 
gotten a good night's sleep and can think again) wherever this is wrong:

1.  As is currently the case, executeInNewThread() is _guaranteed_ to 
start the task immediately.  There is never a queue involved.

2.  Unlike the current implementation, executeInNewThread() may use a 
cached thread.  It will **NOT**, however, put the task on a queue or 
otherwise delay its execution.  If no cached thread is available, it 
will create a new one and possibly destroy it when the task is done.

Thanks for this suggestion.  Now that I (think I) understand it, it 
makes sense in principle.  The devil may be in the details, though.

1.  How many threads should be cached?  I guess this could just be 
configurable with some reasonable default.

2.  Should the cache be lazily or eagerly created?  I'd assume lazily.

3.  Where would these threads be stored?  I think they probably belong 
in some kind of thread-safe global data structure, **NOT** in a TaskPool 
instance.

4.  How would we explain to people what the cache is good for and how to 
use it?  The fact that you're proposing it and even you find this 
difficult to convey makes me skeptical that this feature is worth the 
weight it adds to the API.  Maybe you'll find it easier once you get 
some sleep.   (I understand the problem it solves at an abstract level 
but have never encountered a concrete use case.  It also took me a long 
time to understand it.)

5.  It would break some relaxations of what @safe tasks can do when 
started via executeInNewThread().

6.  This whole proposal might fit better in std.concurrency, by using a 
thread cache for spawn().