std.parallelism curious results
Ali Çehreli via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Sun Oct 5 14:25:39 PDT 2014
On 10/05/2014 07:27 AM, flamencofantasy wrote:
> I am summing up the first 1 billion integers in parallel and in a single
> thread and I'm observing some curious results;
>
> parallel sum : 499999999500000000, elapsed 102833 ms
> single thread sum : 499999999500000000, elapsed 1667 ms
>
> The parallel version is 60+ times slower
Reducing the number of threads is key. However, unlike what others said,
parallel() does not use that many threads. By default, TaskPool objects
are constructed by 'totalCPUs - 1' worker threads. All of parallel()'s
iteration are executed on that few threads.
The main problem here is the use of atomicOp, which necessarily
synchronizes the whole process.
Something like the following takes advantage of parallelism and reduces
the execution time by half on my machine (4 cores (hyperthreaded 2 actul
ones)).
ulong adder(ulong beg, ulong end)
{
ulong localSum = 0;
foreach (i; beg .. end) {
localSum += i;
}
return localSum;
}
enum totalTasks = 10;
foreach(i; parallel(iota(0, totalTasks)))
{
ulong beg = i * iter / totalTasks;
ulong end = beg + iter / totalTasks;
atomicOp!"+="(sum, adder(beg, end));
}
Ali
More information about the Digitalmars-d-learn
mailing list