std.parallelism changes done

Fri Mar 25 02:33:17 PDT 2011

Am 25.03.2011 05:14, schrieb dsimcha:
> On 3/24/2011 10:21 PM, Sönke Ludwig wrote:
>>>
>>> Can you elaborate and/or provide an example of the "general" problem?
>>> I'm not quite sure what you're getting at.
>>
>> I have one very specific constellation that I can only sketch.. Suppose
>> you have some kind of complex computation going on in the ThreadPool.
>> This computation is done by a large set of tasks where each tasks
>> depends on the result of one or more other tasks. One task is
>> responsible for coordinating the work - it is spawning tasks and waiting
>> for their completion to spawn new tasks for which the results are now
>> available.
>>
>
> As I've said before in related discussions, you are _probably_ better
> off using one of the high level primitives instead of using tasks
> directly in these cases. If not, I'd prefer to improve the higher level
> primitives and/or create new ones if possible. (Feel free to suggest one
> if you can think of it.) Tasks are, IMHO, too low level for anything
> except basic future/promise parallelism and implementing higher level
> primitives. Incidentally the situation you describe (a coordinator task
> creating lots of worker tasks) is exactly how amap(), reduce() and
> parallel foreach work under the hood. This complexity is completely
> encapsulated, though.

I would certainly agree that this belongs to a higher level structure. 
This structure would basically get a set of special tasks, where each of 
those tasks has a list of all the tasks it depends on. All tasks would 
then be executed in parallel on a thread pool in on order that 
statisfies their dependencies - possibly with some form of cost function 
that controls which task should come first if there are multiple orders 
possible.

One problem here is for example that for the system I have here, I need 
to execute several tasks in the main thread by sending a message to it 
(the main thread executes window messages in a loop). Specifically this 
is for tasks that use OpenGL or a similar API that has a single thread 
assigned - and the main thread is the most efficient one to use because 
it already has an OpenGL context.

The important thing is to either support such things or to make it 
general enough to let the user add it from the outside. Otherwise if you 
really need such things, the only option is to completely use a custom 
thread pool and thins means no parallel for, map, reduce and whatever 
might be added later.

>
>> First thing here is that you do not want to do the waitForce() kind of
>> waiting in the coordinator task because this might cause the coordinator
>> to be busy with an expensive taks while it could already spawn new tasks
>> because maybe in the meantime some other tasks have already finished.
>
> I assume you mean yieldForce().

Yes, sorry, got the names mixed up.

>
>>
>> However, if you wait for a condition variable instead (which is fired
>> after each finished task) and if you can have multiple computations of
>> this kind running in parallel, you can immediately run into the
>> situation that the thread pool is crowded with coordinator tasks that
>> are all waiting for their condition variables which will never be
>> triggered because no worker tasks can be executed.
>
> I assume you're talking about a condition variable other than the one
> yieldForce() uses. As mentioned previously, in the specific case of
> yieldForce() this is a solved problem. In the general case I can see the
> problem.
>

Yes, just the general problem with other condition variables.

>>
>> This is only one example and basically this problem can arise in all
>> cases where one task depends on another task by some form of waiting
>> that will not execute the dependency like waitForce() does.
>
> Hmm, ok, I definitely understand the problem now.
>
>>>> But what I wanted to say is, even if it may be difficult to implement
>>>> such thread caching now, putting means to execute a Task in its own
>>>> thread now into the ThreadPool allows for such an optimization later
>>>> (it
>>>> could even exist while still keeping Task.executeInNewThread()).
>>>
>>> I can't really comment because I still don't understand this very well.
>>
>> I hope I could make it a little more clear what I mean. The problem is
>> just that the system I'm talking about is quite complex and it's not
>> easy to find good and simple examples in that system. The problems of
>> course arise only in the most complex pathes of execution..
>>
>> What I'm not sure about is if executeInNewThread() is supposed to be
>> useful just because it is somtimes nice to have the fine-grained
>> parallelism of the OS scheduler as opposed to task granilarity, or if
>> the advantage is supposed to be efficiency gained because the thread
>> pool is not created. In the latter case the caching of some threads to
>> be reused for a executeInOwnThread()-method should lead to a better
>> performance in almost any case where thread creation overhead is
>> relevant.
>
> Ok, now I'm starting to understand this. Please correct me (once you've
> gotten a good night's sleep and can think again) wherever this is wrong:
>
> 1. As is currently the case, executeInNewThread() is _guaranteed_ to
> start the task immediately. There is never a queue involved.
>
> 2. Unlike the current implementation, executeInNewThread() may use a
> cached thread. It will **NOT**, however, put the task on a queue or
> otherwise delay its execution. If no cached thread is available, it will
> create a new one and possibly destroy it when the task is done.

Exactly.

>
> Thanks for this suggestion. Now that I (think I) understand it, it makes
> sense in principle. The devil may be in the details, though.
>
> 1. How many threads should be cached? I guess this could just be
> configurable with some reasonable default.

A configurable minimum number of threads sounds reasonable. The default 
could probably be a fixed small number like 1 or 2.

>
> 2. Should the cache be lazily or eagerly created? I'd assume lazily.

Lazy sounds good.

>
> 3. Where would these threads be stored? I think they probably belong in
> some kind of thread-safe global data structure, **NOT** in a TaskPool
> instance.

Thats a good question.. ThreadPool would be nice because it is the class 
of which maybe you are already dragging an instance through your code. 
Global would certainly work.

>
> 4. How would we explain to people what the cache is good for and how to
> use it? The fact that you're proposing it and even you find this
> difficult to convey makes me skeptical that this feature is worth the
> weight it adds to the API. Maybe you'll find it easier once you get some
> sleep. (I understand the problem it solves at an abstract level but have
> never encountered a concrete use case. It also took me a long time to
> understand it.)

I would basically just say its a faster way than to create a new thread 
each time you start a task. Use it whenever you need to have a task run 
outside of the thread pool threads - candidates are tasks that wait a 
lot, either because of IO or because of waiting primitives apart from 
the ones present in ThreadPool (message queues, condition variables).

But please don't make the mistake to dismiss the problem because it is 
complex. Beeing complex and maybe rare does not mean it cannot be 
important. Its like a bug that will delete your data but in very rare 
and complex use cases of the application. You would not want to ignore 
that just because of those reasons.

Also I'm not sure if using the primitives of std.concurrency is allowed 
within in a Task, maybe not. But if it is, it would be really easy to 
construct a higher level example without condition variables and stuff 
like that.

>
> 5. It would break some relaxations of what @safe tasks can do when
> started via executeInNewThread().

You mean because of TLS that is not reinitialized each time? I have to 
admit that I can't really gauge the impact of this.

>
> 6. This whole proposal might fit better in std.concurrency, by using a
> thread cache for spawn().

But isn't the previous problem (5.) even more relevant in 
std.concurrency? Putting it near ThreadPool could be a good idea because 
it still is some sort of thread pool in the abstract sense. It could 
also be something that std.concurrency uses for its spawn().

Anyway, I would be happy if there would be a place allocated for this 
wherever this fits. If it is std.concurrency, its fine as long as 
std.concurrency and std.parallelism play well together. One problem with 
std.concurrency is that it also does not really work well when you need 
to wait for other primitives than a message. To have something like 
WaitForMultipleObjects is critical in many cases, but thats another topic.

Having said all this, I just want to make sure that you don't get this 
wrong. I certainly to not want to push in complex changes for no reason 
and no complex changes at all for that matter. And in this case I see 
how it is _functionally_ independent from the ThreadPool itself.

My problem here is just that there is a executeInNewThread function that 
really should not be used for _many_ tasks and maybe in most other cases 
it would be cleaner to use spawn() - it may still have its place if you 
count in the @safe implications, but I would like to also see a function 
supporting the same threading guarantees but suitable for many tasks, if 
only to avoid bad usage patterns of executeInNewThread.

However it definitely is in the gray area between std.concurrency and 
std.parallelism.