Scalability in std.parallelism

"Nordlöw" per.nordlow at gmail.com
Sat Feb 22 08:21:20 PST 2014


In the following test code given below of std.parallelism I get 
some interesting results:

when compiled as

dmd -release -noboundscheck -O -inline -w -wi -wi  
~/Work/justd/t_parallelism.d -oft_parallelism

My scalability measures says the following

3.14159 took 221[ms]
3.14159 took 727[ms]
Speedup 3.28959
-5.80829e+09 took 33[ms]
-5.80829e+09 took 201[ms]
Speedup 6.09091

Why do I get a larger speed for a simpler map function?
Shouldn't it be the opposite?
I've always read that the more calculations I perform on each 
memory access the better the speedup...

Anyhow the speedups are great!

I'm sitting on a Intel Quad core with 8 hyperthreads.


Sample code follows:



import std.algorithm, std.parallelism, std.range, std.datetime, 
std.stdio;

void test1 () {
     immutable n = 100_000_000;
     immutable delta = 1.0 / n;

     auto piTerm(int i) {
         immutable x = (i - 0.5) * delta;
         return delta / (1.0 + x*x);
     }

     auto nums = n.iota.map!piTerm; // numbers
     StopWatch sw;

     sw.reset();
     sw.start();
     immutable pi = 4.0*taskPool.reduce!"a+b"(nums);
     sw.stop();
     immutable ms = sw.peek().msecs;
     writeln(pi, " took ", ms, "[ms]");

     sw.reset();
     sw.start();
     immutable pi_ = 4.0*std.algorithm.reduce!"a+b"(nums);
     sw.stop();
     immutable ms_ = sw.peek().msecs;
     writeln(pi_, " took ", ms_, "[ms]");

     writeln("Speedup ", cast(real)ms_ / ms);
}

auto square(T)(T i) @safe pure nothrow { return i*i; }

void test2 () {
     immutable n = 100_000_000;
     immutable delta = 1.0 / n;

     auto nums = n.iota.map!square; // numbers
     StopWatch sw;

     sw.reset();
     sw.start();
     immutable pi = 4.0*taskPool.reduce!"a+b"(nums);
     sw.stop();
     immutable ms = sw.peek().msecs;
     writeln(pi, " took ", ms, "[ms]");

     sw.reset();
     sw.start();
     immutable pi_ = 4.0*std.algorithm.reduce!"a+b"(nums);
     sw.stop();
     immutable ms_ = sw.peek().msecs;
     writeln(pi_, " took ", ms_, "[ms]");

     writeln("Speedup ", cast(real)ms_ / ms);
}

void main () {
     test1();
     test2();
}


More information about the Digitalmars-d-learn mailing list