Quick and dirty Benchmark of std.parallelism.reduce with gdc 4.6.3

Sat Dec 15 08:11:30 PST 2012

I recently made some benchmarks with parallelism version of 
Reduce using the example code, and I got this times with this 
CPUs :

AMD FX(tm)-4100 Quad-Core Processor (Kubuntu 12.04 x64):
std.algorithm.reduce   = 70294 ms
std.parallelism.reduce = 18354 ms -> SpeedUp = ~3.79

2x AMD Opteron(tm) Processor 6128 aka 8 cores x 2 = 16 cores! 
(Rocks 6.0 x64) :
std.algorithm.reduce   = 98323 ms
std.parallelism.reduce = 6592 ms  -> SpeedUp = ~14.91

My congrats to std.parallelism and D language!

Source code compile with gdc 4.6.3 with -o2 flag :
import std.algorithm, std.parallelism, std.range;
import std.stdio;
import std.datetime;

void main() {
   // Parallel reduce can be combined with std.algorithm.map to 
interesting
   // effect. The following example (thanks to Russel Winder) 
calculates
   // pi by quadrature using std.algorithm.map and TaskPool.reduce.
   // getTerm is evaluated in parallel as needed by 
TaskPool.reduce.
   // // Timings on an Athlon 64 X2 dual core machine:
   // // TaskPool.reduce: 12.170 s
   // std.algorithm.reduce: 24.065 s

   immutable n = 1_000_000_000;
   immutable delta = 1.0 / n;
   real getTerm(int i) {
     immutable x = ( i - 0.5 ) * delta;
     return delta / ( 1.0 + x * x ) ;
   }

   StopWatch sw;
   sw.start(); //start/resume mesuring.
   immutable pi = 4.0 * taskPool.reduce!"a + b"( 
std.algorithm.map!getTerm(iota(n)) );
   //immutable pi = 4.0 * std.algorithm.reduce!"a + b"( 
std.algorithm.map!getTerm(iota(n)) );
   sw.stop();

   writeln("PI = ", pi);
   writeln("Tiempo = ", sw.peek().msecs, "[ms]");
}