WTF! Parallel foreach more slower that normal foreach in multicore CPU ?

Andrej Mitrovic andrej.mitrovich at gmail.com
Thu Jun 23 16:41:27 PDT 2011


I don't know why David set a work unit of 100 for a 1 million element
array. I get slow results for this example:
    foreach(i, ref elem; taskPool.parallel(logs, 100)) {
        elem = log(i + 1.0);
    }

CPUs : 4
Serial usecs:   70418.
Parallel usecs: 91519.

But if I up the work unit size to 100_000 I get much better results:
    foreach(i, ref elem; taskPool.parallel(logs, 100_000)) {
        elem = log(i + 1.0);
    }

CPUs : 4
Serial usecs:   69979.
Parallel usecs: 25355.

Sometimes the best thing to do is let parallel use the default work unit size:
    foreach(i, ref elem; taskPool.parallel(logs)) {
        elem = log(i + 1.0);
    }
CPUs : 4
Serial usecs:   70219.
Parallel usecs: 21942.

Here's your original example on my PC:
CPUs : 4
Normal : 1.4609 Parallel : 2.4797

And here it is by letting parallel use the default work unit size:
CPUs : 4
Normal : 1.461 Parallel : 0.425

It's all about fine-tuning your parameters. Essentially when you up
the work unit size it means each thread will process more elements
from the array or range. If the loop body executes really fast, then
you should increase the work unit size.


More information about the Digitalmars-d-learn mailing list