WTF! Parallel foreach more slower that normal foreach in multicore CPU ?

Thu Jun 23 03:05:19 PDT 2011

I'm trying std.parallelism, and I made this code (based over foreach parallel example) :
import std.stdio;
import std.parallelism;
import std.math;
import std.c.time;

void main () {
  auto logs = new double[20_000_000];
		const num = 10;

		clock_t clk;
		double norm;
		double par;

		writeln("CPUs : ",totalCPUs );

		clk = clock();
		foreach (t; 0..num) {

	    foreach(i, ref elem; logs) {
	        elem = log(i + 1.0);
	    }
		}
		norm = clock() -clk;

		clk = clock();
		foreach (t; 0..num) {

	    foreach(i, ref elem; taskPool.parallel(logs, 100)) {
	        elem = log(i + 1.0);
	    }

    }
		par = clock() -clk;

		norm = norm / num;
		par = par / num;

    writeln("Normal : ", norm / CLOCKS_PER_SEC, " Parallel : ", par / CLOCKS_PER_SEC);
}

I get this result :

CPUs : 2
Normal : 1.325 Parallel : 1.646

And the result changes, every time that I run it, around +-100ms (I think that depends of how are CPUs busy in these moment)

I played changin workUnitSize from 1 to 10000000 without any apreciable change....
My computer it's a AMD Athlon 64 X2 Dual Core Processor 6000+ running over a kUbuntu 11.04 64bits with 2 GiB of ram. I compiled it with dmd 2.053
htop shows that when test program are running parallel foreach, both cores are at ~98% of load and with normal foreach, only one core gets at ~99% of load.