Concurrency and program speed

Thu Feb 28 07:03:38 PST 2013

Does the laptop really have 4 cores or is it 2 cores with hyperthreading?  My guess is the latter, and that will contribute to the timing you're seeing. Also, other things are going on in the system. Do larger jobs show a better or worse speedup?

On Feb 28, 2013, at 6:15 AM, Joseph Rushton Wakeling <joseph.wakeling at webdrake.net> wrote:

> Hello all,
> 
> I'm in need of some guidance regarding std.concurrency.  Before writing further, I should add that I'm an almost complete novice where concurrency is concerned, in general and particularly with D: I've written a few programs that made use of std.parallelism but that's about it.
> 
> In this case, there's a strong need to use std.concurrency because the functions that will be run in parallel involve generating substantial quantities of random numbers.  AFAICS std.parallelism just isn't safe for that, in a statistical sense (no idea how it might slow things down in terms of shared access to a common rndGen).
> 
> Now, I'm not naive enough to believe that using n threads will simply result in the program runtime being divided by n.  However, the results I'm getting with some simple test code (attached) are curious and I'd like to understand better what's going on.
> 
> The program is simple enough:
> 
>      foreach(i; iota(n))
>            spawn(&randomFunc, m);
> 
> ... where randomFunc is a function that generates and sums m different random numbers.  For speed comparison one can do instead,
> 
>      foreach(i; iota(n))
>            randomFunc(m);
> 
> With m = 100_000_000 being chosen for my case.
> 
> Setting n = 2 on my 4-core laptop, the sequential case runs in about 4 s; the concurrent version using spawn() runs in about 2.2 s (the total amount of "user" time given for the sequential programs is about 4 s and about 4.3 s respectively).  So, roughly half speed, as you might expect.
> 
> Setting n = 3, the sequential case runs in about 6 s (surprise!), the concurrent version in about 3 (with about 8.1 s of "user" time recorded).  In other words, the program speed is only half that of the sequential version, even though there's no shared data and the CPU can well accommodate the 3 threads at full speed.  (In fact 270% CPU usage is recorded, but that should still see a faster program.)
> 
> Setting n = 4, the sequential case runs in 8 s, the concurrent in about 3.8 (with 14.8 s of "user" time recorded), with 390% CPU usage.
> 
> In other words, it doesn't seem possible to get more than about 2 * speedup on my system from using concurrency, even though there should not be any data races or other factors that might explain slower performance.
> 
> I didn't expect speed / n, but I did expect something a little better than this -- so can anyone suggest what might be going on here?  (Unfortunately, I don't have a system with a greater number of cores on which to test with greater numbers of threads.)
> 
> The times reported here are for programs compiled with GDC, but using LDC or DMD produces similar behaviour.
> 
> Can anyone advise?
> 
> Thanks & best wishes,
> 
>    -- Joe
> <concur.d>