parallel threads stalls until all thread batches are finished.

Mon Aug 28 22:37:45 UTC 2023

On Monday, 28 August 2023 at 10:33:15 UTC, Christian Köstlin 
wrote:
> On 26.08.23 05:39, Joe at bloow.edu wrote:
>> On Friday, 25 August 2023 at 21:31:37 UTC, Ali Çehreli wrote:
>>> On 8/25/23 14:27, Joe at bloow.edu wrote:
>>>
>>> > "A work unit is a set of consecutive elements of range to be
>>> processed
>>> > by a worker thread between communication with any other
>>> thread. The
>>> > number of elements processed per work unit is controlled by
>>> the
>>> > workUnitSize parameter. "
>>> >
>>> > So the question is how to rebalance these work units?
>>>
>>> Ok, your question brings me back from summer hibernation. :)
>>>
>>> This is what I do:
>>>
>>> - Sort the tasks in decreasing time order; the ones that will 
>>> take the most time should go first.
>>>
>>> - Use a work unit size of 1.
>>>
>>> The longest running task will start first. You can't get 
>>> better than that. When I print some progress reporting, I see 
>>> that most of the time N-1 tasks have finished and we are 
>>> waiting for that one longest running task.
>>>
>>> Ali
>>> "back to sleep"
>> 
>> 
>> I do not know the amount of time they will run. They are files 
>> that are being downloaded and I neither know the file size nor 
>> the download rate(in fact, the actual download happens 
>> externally).
>> 
>> While I could use work unit of size 1 then problem then is I 
>> would be downloading N files at once and that will cause other 
>> problems if N is large(and sometimes it is).
>> 
>> There should be a "work unit size" and a "max simultaneous 
>> workers". Then I could set the work unit size to 1 and say the 
>> max simultaneous workers to 8 to get 8 simultaneous downloads 
>> without stalling.
>
> I think thats what is implemented atm ...
> `taskPool` creates a `TaskPool` of size `defaultPoolThreads` 
> (defaulting to totalCPUs - 1). The work unit size is only there 
> to optimize for small workloads where task / thread switching 
> would be a big performance problem (I guess). So in your case a 
> work unit size of 1 should be good.
>
> Did you try this already?
>
> Kind regards,
> Christian

Well, I have 32 cores so that would spawn 64-1 threads with hyper 
threading so not really a solution as it is too many simultaneous 
downs IMO.

"These properties get and set the number of worker threads in the 
TaskPool instance returned by taskPool. The default value is 
totalCPUs - 1. Calling the setter after the first call to 
taskPool does not changes number of worker threads in the 
instance returned by taskPool. "

I guess I could try to see if I can change this but I don't know 
what the "first call" is(and I'm using parallel to create it).

Seems that the code should simply be made more robust. Probably a 
just a few lines of code to change/add at most. Maybe the 
constructor and parallel should take an argument to set the 
"totalCPUs" which defaults to getting the total number rather 
than it being hard coded.

I currently don't need or have 32+ downlaods to test ATM so...

    this() @trusted
     {
         this(totalCPUs - 1);
     }

     /**
     Allows for custom number of worker threads.
     */
     this(size_t nWorkers) @trusted
     {

Basically everything is hard coded to use totalCPU's and that is 
the ultimate problem. Not all tasks should use all CPU's.

What happens when we get 128 cores? or even 32k at some point?

It shouldn't be a hard coded value, it's really that simple and 
where the problem originates because someone didn't think ahead.