WTF! Parallel foreach more slower that normal foreach in multicore CPU ?

Fri Jun 24 00:40:27 PDT 2011

On Thu, 23 Jun 2011 23:18:36 +0000, Zardoz wrote:

> Code :
>  auto logs = new double[200];
>  const num = 2;
>  clock_t clk;
>  double norm;
>  double par;
>  writeln("CPUs : ",totalCPUs );
>  clk = clock();
>  foreach(i, ref elem; logs) {
>   elem = log(i + 1.0);
>  }
>  norm = clock() -clk;
>  clk = clock();
>  foreach(i, ref elem; taskPool.parallel(logs, 100)) {
>   elem = log(i + 1.0);
>  }
> 
> I get same problem. Parallel foreach, is more slower that normal
> foreach. And it's same code that hace lib example that claims that
> parallel foreach do it in aprox. half time in Athlon X2

I was able to reproduce your results. I think there is a problem with 
clock(). Try StopWatch:

import std.parallelism;
import std.stdio;
import std.math;
import std.datetime;

void main()
{
    auto logs = new double[200_000_000];

    writeln("CPUs : ",totalCPUs );

    {
        StopWatch stopWatch;
        stopWatch.start();

        foreach(i, ref elem; logs) {
            elem = log(i + 1.0);
        }

        writeln(stopWatch.peek().msecs);
    }

    {
        StopWatch stopWatch;
        stopWatch.start();

        foreach(i, ref elem; parallel(logs)) {
            elem = log(i + 1.0);
        }

        writeln(stopWatch.peek().msecs);
    }
}

Here is my output:

CPUs : 4
8061
2686

I get similar results whether I pass 100_000 to parallel() or not.

Ali