Quick and dirty Benchmark of std.parallelism.reduce with gdc 4.6.3
Zardoz
luis.panadero at gmail.com
Sat Dec 15 08:11:30 PST 2012
I recently made some benchmarks with parallelism version of
Reduce using the example code, and I got this times with this
CPUs :
AMD FX(tm)-4100 Quad-Core Processor (Kubuntu 12.04 x64):
std.algorithm.reduce = 70294 ms
std.parallelism.reduce = 18354 ms -> SpeedUp = ~3.79
2x AMD Opteron(tm) Processor 6128 aka 8 cores x 2 = 16 cores!
(Rocks 6.0 x64) :
std.algorithm.reduce = 98323 ms
std.parallelism.reduce = 6592 ms -> SpeedUp = ~14.91
My congrats to std.parallelism and D language!
Source code compile with gdc 4.6.3 with -o2 flag :
import std.algorithm, std.parallelism, std.range;
import std.stdio;
import std.datetime;
void main() {
// Parallel reduce can be combined with std.algorithm.map to
interesting
// effect. The following example (thanks to Russel Winder)
calculates
// pi by quadrature using std.algorithm.map and TaskPool.reduce.
// getTerm is evaluated in parallel as needed by
TaskPool.reduce.
// // Timings on an Athlon 64 X2 dual core machine:
// // TaskPool.reduce: 12.170 s
// std.algorithm.reduce: 24.065 s
immutable n = 1_000_000_000;
immutable delta = 1.0 / n;
real getTerm(int i) {
immutable x = ( i - 0.5 ) * delta;
return delta / ( 1.0 + x * x ) ;
}
StopWatch sw;
sw.start(); //start/resume mesuring.
immutable pi = 4.0 * taskPool.reduce!"a + b"(
std.algorithm.map!getTerm(iota(n)) );
//immutable pi = 4.0 * std.algorithm.reduce!"a + b"(
std.algorithm.map!getTerm(iota(n)) );
sw.stop();
writeln("PI = ", pi);
writeln("Tiempo = ", sw.peek().msecs, "[ms]");
}
More information about the Digitalmars-d
mailing list