Concurrency and program speed

Thu Feb 28 06:15:41 PST 2013

Hello all,

I'm in need of some guidance regarding std.concurrency.  Before writing further, 
I should add that I'm an almost complete novice where concurrency is concerned, 
in general and particularly with D: I've written a few programs that made use of 
std.parallelism but that's about it.

In this case, there's a strong need to use std.concurrency because the functions 
that will be run in parallel involve generating substantial quantities of random 
numbers.  AFAICS std.parallelism just isn't safe for that, in a statistical 
sense (no idea how it might slow things down in terms of shared access to a 
common rndGen).

Now, I'm not naive enough to believe that using n threads will simply result in 
the program runtime being divided by n.  However, the results I'm getting with 
some simple test code (attached) are curious and I'd like to understand better 
what's going on.

The program is simple enough:

       foreach(i; iota(n))
             spawn(&randomFunc, m);

... where randomFunc is a function that generates and sums m different random 
numbers.  For speed comparison one can do instead,

       foreach(i; iota(n))
             randomFunc(m);

With m = 100_000_000 being chosen for my case.

Setting n = 2 on my 4-core laptop, the sequential case runs in about 4 s; the 
concurrent version using spawn() runs in about 2.2 s (the total amount of "user" 
time given for the sequential programs is about 4 s and about 4.3 s 
respectively).  So, roughly half speed, as you might expect.

Setting n = 3, the sequential case runs in about 6 s (surprise!), the concurrent 
version in about 3 (with about 8.1 s of "user" time recorded).  In other words, 
the program speed is only half that of the sequential version, even though 
there's no shared data and the CPU can well accommodate the 3 threads at full 
speed.  (In fact 270% CPU usage is recorded, but that should still see a faster 
program.)

Setting n = 4, the sequential case runs in 8 s, the concurrent in about 3.8 
(with 14.8 s of "user" time recorded), with 390% CPU usage.

In other words, it doesn't seem possible to get more than about 2 * speedup on 
my system from using concurrency, even though there should not be any data races 
or other factors that might explain slower performance.

I didn't expect speed / n, but I did expect something a little better than this 
-- so can anyone suggest what might be going on here?  (Unfortunately, I don't 
have a system with a greater number of cores on which to test with greater 
numbers of threads.)

The times reported here are for programs compiled with GDC, but using LDC or DMD 
produces similar behaviour.

Can anyone advise?

Thanks & best wishes,

     -- Joe
-------------- next part --------------
A non-text attachment was scrubbed...
Name: concur.d
Type: text/x-dsrc
Size: 341 bytes
Desc: not available
URL: <http://lists.puremagic.com/pipermail/digitalmars-d-learn/attachments/20130228/d4d707de/attachment-0001.d>