std.parallelism curious results

Ali Çehreli via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Sun Oct 5 14:25:39 PDT 2014


On 10/05/2014 07:27 AM, flamencofantasy wrote:

 > I am summing up the first 1 billion integers in parallel and in a single
 > thread and I'm observing some curious results;
 >
 > parallel sum : 499999999500000000, elapsed 102833 ms
 > single thread sum : 499999999500000000, elapsed 1667 ms
 >
 > The parallel version is 60+ times slower

Reducing the number of threads is key. However, unlike what others said, 
parallel() does not use that many threads. By default, TaskPool objects 
are constructed by 'totalCPUs - 1' worker threads. All of parallel()'s 
iteration are executed on that few threads.

The main problem here is the use of atomicOp, which necessarily 
synchronizes the whole process.

Something like the following takes advantage of parallelism and reduces 
the execution time by half on my machine (4 cores (hyperthreaded 2 actul 
ones)).

     ulong adder(ulong beg, ulong end)
     {
         ulong localSum = 0;

         foreach (i; beg .. end) {
             localSum += i;
         }

         return localSum;
     }

     enum totalTasks = 10;

     foreach(i; parallel(iota(0, totalTasks)))
     {
         ulong beg = i * iter / totalTasks;
         ulong end = beg + iter / totalTasks;

         atomicOp!"+="(sum, adder(beg, end));
     }

Ali



More information about the Digitalmars-d-learn mailing list