Scalability in std.parallelism
"Nordlöw"
per.nordlow at gmail.com
Sat Feb 22 08:21:20 PST 2014
In the following test code given below of std.parallelism I get
some interesting results:
when compiled as
dmd -release -noboundscheck -O -inline -w -wi -wi
~/Work/justd/t_parallelism.d -oft_parallelism
My scalability measures says the following
3.14159 took 221[ms]
3.14159 took 727[ms]
Speedup 3.28959
-5.80829e+09 took 33[ms]
-5.80829e+09 took 201[ms]
Speedup 6.09091
Why do I get a larger speed for a simpler map function?
Shouldn't it be the opposite?
I've always read that the more calculations I perform on each
memory access the better the speedup...
Anyhow the speedups are great!
I'm sitting on a Intel Quad core with 8 hyperthreads.
Sample code follows:
import std.algorithm, std.parallelism, std.range, std.datetime,
std.stdio;
void test1 () {
immutable n = 100_000_000;
immutable delta = 1.0 / n;
auto piTerm(int i) {
immutable x = (i - 0.5) * delta;
return delta / (1.0 + x*x);
}
auto nums = n.iota.map!piTerm; // numbers
StopWatch sw;
sw.reset();
sw.start();
immutable pi = 4.0*taskPool.reduce!"a+b"(nums);
sw.stop();
immutable ms = sw.peek().msecs;
writeln(pi, " took ", ms, "[ms]");
sw.reset();
sw.start();
immutable pi_ = 4.0*std.algorithm.reduce!"a+b"(nums);
sw.stop();
immutable ms_ = sw.peek().msecs;
writeln(pi_, " took ", ms_, "[ms]");
writeln("Speedup ", cast(real)ms_ / ms);
}
auto square(T)(T i) @safe pure nothrow { return i*i; }
void test2 () {
immutable n = 100_000_000;
immutable delta = 1.0 / n;
auto nums = n.iota.map!square; // numbers
StopWatch sw;
sw.reset();
sw.start();
immutable pi = 4.0*taskPool.reduce!"a+b"(nums);
sw.stop();
immutable ms = sw.peek().msecs;
writeln(pi, " took ", ms, "[ms]");
sw.reset();
sw.start();
immutable pi_ = 4.0*std.algorithm.reduce!"a+b"(nums);
sw.stop();
immutable ms_ = sw.peek().msecs;
writeln(pi_, " took ", ms_, "[ms]");
writeln("Speedup ", cast(real)ms_ / ms);
}
void main () {
test1();
test2();
}
More information about the Digitalmars-d-learn
mailing list