std.parallelism curious results
Russel Winder via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Sun Oct 5 10:02:21 PDT 2014
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 05/10/14 15:27, flamencofantasy via Digitalmars-d-learn wrote:
> Hello,
>
> I am summing up the first 1 billion integers in parallel and in a
> single thread and I'm observing some curious results;
I am fairly certain that your use of "parallel for" introduces quite a
lot of threads other than you "master" one.
> parallel sum : 499999999500000000, elapsed 102833 ms single thread
> sum : 499999999500000000, elapsed 1667 ms
>
> The parallel version is 60+ times slower on my i7-3770K CPU. I
> think that maybe due to the CPU constantly flushing and reloading
> the caches in the parallel version but I don't know for sure.
I would bet there are cache problems, but far more likely that the
core problem is all the thread activity and in particular all the
synchronization.
> Here is the D code;
>
> shared ulong sum = 0; ulong iter = 1_000_000_000UL;
>
> StopWatch sw;
>
> sw.start();
>
> foreach(i; parallel(iota(0, iter))) { atomicOp!"+="(sum, i); }
Well that will be the problem then, lots and lots of synchronization
with the billion tasks you have set up. I am highly surprised this is
only 60 times slower than sequential!
> sw.stop();
>
> writefln("parallel sum : %s, elapsed %s ms", sum,
> sw.peek().msecs);
>
> sum = 0;
>
> sw.reset();
>
> sw.start();
>
> for (ulong i = 0; i < iter; ++i) { sum += i; }
>
> sw.stop();
>
> writefln("single thread sum : %s, elapsed %s ms", sum,
> sw.peek().msecs);
>
> Out of curiosity I tried the equivalent code in C# and I got this;
>
> parallel sum : 499999999500000000, elapsed 20320 ms single thread
> sum : 499999999500000000, elapsed 1901 ms
>
> The C# parallel is about 3 times faster than the D parallel which
> is strange on the exact same CPU.
>
> And here is the C# code;
>
> long sum = 0; long iter = 1000000000L;
>
> var sw = Stopwatch.StartNew();
>
> Parallel.For(0, iter, i => { Interlocked.Add(ref sum, i); });
Useful moral of this story is that C# synchronization in this
(somewhat perverse) context is relatively much more efficient than
that of D.
There is almost certainly a useful benchmark test that can come of
this for the std.parallelism implementation (if only I had a few
cycles to get really stuck in to a review and analysis of the module :-(
> Console.WriteLine("parallel sum : {0}, elapsed {1} ms", sum,
> sw.ElapsedMilliseconds);
>
> sum = 0;
>
> sw = Stopwatch.StartNew();
>
> for (long i = 0; i < iter; ++i) { sum += i; }
>
> Console.WriteLine("single thread sum : {0}, elapsed {1} ms", sum,
> sw.ElapsedMilliseconds);
>
> Thoughts?
- --
Russel.
=============================================================================
Dr Russel Winder t: +44 20 7585 2200 voip:
sip:russel.winder at ekiga.net
41 Buckmaster Road m: +44 7770 465 077 xmpp: russel at winder.org.uk
London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iEYEARECAAYFAlQxeZ0ACgkQ+ooS3F10Be+DKQCgu2Ro+2bVmEua3oPHZ6kAqUVv
cg8AoLpN3BRvLBQLT8qDaiP0wVMS5dQZ
=w4Gx
-----END PGP SIGNATURE-----
More information about the Digitalmars-d-learn
mailing list