std.parallelism curious results

Russel Winder via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Sun Oct 5 10:02:21 PDT 2014


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/10/14 15:27, flamencofantasy via Digitalmars-d-learn wrote:
> Hello,
> 
> I am summing up the first 1 billion integers in parallel and in a
> single thread and I'm observing some curious results;

I am fairly certain that your use of "parallel for" introduces quite a
lot of threads other than you "master" one.

> parallel sum : 499999999500000000, elapsed 102833 ms single thread
> sum : 499999999500000000, elapsed 1667 ms
> 
> The parallel version is 60+ times slower on my i7-3770K CPU. I
> think that maybe due to the CPU constantly flushing and reloading
> the caches in the parallel version but I don't know for sure.

I would bet there are cache problems, but far more likely that the
core problem is all the thread activity and in particular all the
synchronization.

> Here is the D code;
> 
> shared ulong sum = 0; ulong iter = 1_000_000_000UL;
> 
> StopWatch sw;
> 
> sw.start();
> 
> foreach(i; parallel(iota(0, iter))) { atomicOp!"+="(sum, i); }

Well that will be the problem then, lots and lots of synchronization
with the billion tasks you have set up. I am highly surprised this is
only 60 times slower than sequential!

> sw.stop();
> 
> writefln("parallel sum : %s, elapsed %s ms", sum,
> sw.peek().msecs);
> 
> sum = 0;
> 
> sw.reset();
> 
> sw.start();
> 
> for (ulong i = 0; i < iter; ++i) { sum += i; }
> 
> sw.stop();
> 
> writefln("single thread sum : %s, elapsed %s ms", sum, 
> sw.peek().msecs);
> 
> Out of curiosity I tried the equivalent code in C# and I got this;
> 
> parallel sum : 499999999500000000, elapsed 20320 ms single thread
> sum : 499999999500000000, elapsed 1901 ms
> 
> The C# parallel is about 3 times faster than the D parallel which
> is strange on the exact same CPU.
> 
> And here is the C# code;
> 
> long sum = 0; long iter = 1000000000L;
> 
> var sw = Stopwatch.StartNew();
> 
> Parallel.For(0, iter, i => { Interlocked.Add(ref sum, i); });

Useful moral of this story is that C# synchronization in this
(somewhat perverse) context is relatively much more efficient than
that of D.

There is almost certainly a useful benchmark test that can come of
this for the std.parallelism implementation (if only I had a few
cycles to get really stuck in to a review and analysis of the module :-(

> Console.WriteLine("parallel sum : {0}, elapsed {1} ms", sum, 
> sw.ElapsedMilliseconds);
> 
> sum = 0;
> 
> sw = Stopwatch.StartNew();
> 
> for (long i = 0; i < iter; ++i) { sum += i; }
> 
> Console.WriteLine("single thread sum : {0}, elapsed {1} ms", sum, 
> sw.ElapsedMilliseconds);
> 
> Thoughts?


- -- 
Russel.
=============================================================================
Dr Russel Winder      t: +44 20 7585 2200   voip:
sip:russel.winder at ekiga.net
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel at winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iEYEARECAAYFAlQxeZ0ACgkQ+ooS3F10Be+DKQCgu2Ro+2bVmEua3oPHZ6kAqUVv
cg8AoLpN3BRvLBQLT8qDaiP0wVMS5dQZ
=w4Gx
-----END PGP SIGNATURE-----


More information about the Digitalmars-d-learn mailing list