std.parallelism curious results

flamencofantasy via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Sun Oct 5 07:27:59 PDT 2014


Hello,

I am summing up the first 1 billion integers in parallel and in a 
single thread and I'm observing some curious results;

parallel sum : 499999999500000000, elapsed 102833 ms
single thread sum : 499999999500000000, elapsed 1667 ms

The parallel version is 60+ times slower on my i7-3770K CPU. I 
think that maybe due to the CPU constantly flushing and reloading 
the caches in the parallel version but I don't know for sure.

Here is the D code;

	shared ulong sum = 0;
	ulong iter = 1_000_000_000UL;

	StopWatch sw;

	sw.start();

	foreach(i; parallel(iota(0, iter)))
	{
		atomicOp!"+="(sum, i);
	}

	sw.stop();

	writefln("parallel sum : %s, elapsed %s ms", sum, 
sw.peek().msecs);

	sum = 0;

	sw.reset();

	sw.start();

	for (ulong i = 0; i < iter; ++i)
	{
		sum += i;
	}

	sw.stop();

	writefln("single thread sum : %s, elapsed %s ms", sum, 
sw.peek().msecs);

Out of curiosity I tried the equivalent code in C# and I got this;

parallel sum : 499999999500000000, elapsed 20320 ms
single thread sum : 499999999500000000, elapsed 1901 ms

The C# parallel is about 3 times faster than the D parallel which 
is strange on the exact same CPU.

And here is the C# code;

long sum = 0;
long iter = 1000000000L;

var sw = Stopwatch.StartNew();

Parallel.For(0, iter, i =>
{
	Interlocked.Add(ref sum, i);
});

Console.WriteLine("parallel sum : {0}, elapsed {1} ms", sum, 
sw.ElapsedMilliseconds);

sum = 0;

sw = Stopwatch.StartNew();

for (long i = 0; i < iter; ++i)
{
	sum += i;
}

Console.WriteLine("single thread sum : {0}, elapsed {1} ms", sum, 
sw.ElapsedMilliseconds);

Thoughts?


More information about the Digitalmars-d-learn mailing list