Concurrency and program speed
Joseph Rushton Wakeling
joseph.wakeling at webdrake.net
Thu Feb 28 06:15:41 PST 2013
Hello all,
I'm in need of some guidance regarding std.concurrency. Before writing further,
I should add that I'm an almost complete novice where concurrency is concerned,
in general and particularly with D: I've written a few programs that made use of
std.parallelism but that's about it.
In this case, there's a strong need to use std.concurrency because the functions
that will be run in parallel involve generating substantial quantities of random
numbers. AFAICS std.parallelism just isn't safe for that, in a statistical
sense (no idea how it might slow things down in terms of shared access to a
common rndGen).
Now, I'm not naive enough to believe that using n threads will simply result in
the program runtime being divided by n. However, the results I'm getting with
some simple test code (attached) are curious and I'd like to understand better
what's going on.
The program is simple enough:
foreach(i; iota(n))
spawn(&randomFunc, m);
... where randomFunc is a function that generates and sums m different random
numbers. For speed comparison one can do instead,
foreach(i; iota(n))
randomFunc(m);
With m = 100_000_000 being chosen for my case.
Setting n = 2 on my 4-core laptop, the sequential case runs in about 4 s; the
concurrent version using spawn() runs in about 2.2 s (the total amount of "user"
time given for the sequential programs is about 4 s and about 4.3 s
respectively). So, roughly half speed, as you might expect.
Setting n = 3, the sequential case runs in about 6 s (surprise!), the concurrent
version in about 3 (with about 8.1 s of "user" time recorded). In other words,
the program speed is only half that of the sequential version, even though
there's no shared data and the CPU can well accommodate the 3 threads at full
speed. (In fact 270% CPU usage is recorded, but that should still see a faster
program.)
Setting n = 4, the sequential case runs in 8 s, the concurrent in about 3.8
(with 14.8 s of "user" time recorded), with 390% CPU usage.
In other words, it doesn't seem possible to get more than about 2 * speedup on
my system from using concurrency, even though there should not be any data races
or other factors that might explain slower performance.
I didn't expect speed / n, but I did expect something a little better than this
-- so can anyone suggest what might be going on here? (Unfortunately, I don't
have a system with a greater number of cores on which to test with greater
numbers of threads.)
The times reported here are for programs compiled with GDC, but using LDC or DMD
produces similar behaviour.
Can anyone advise?
Thanks & best wishes,
-- Joe
-------------- next part --------------
A non-text attachment was scrubbed...
Name: concur.d
Type: text/x-dsrc
Size: 341 bytes
Desc: not available
URL: <http://lists.puremagic.com/pipermail/digitalmars-d-learn/attachments/20130228/d4d707de/attachment-0001.d>
More information about the Digitalmars-d-learn
mailing list