Grokking concurrency, message passing and Co

Sun Jul 11 07:28:17 PDT 2010

Hi,

so, after reading TPDL, I decided to test my comprehension of message-passing
and such. Please note that before this day, I never wrote any concurrent code,
ever.

I wanted to begin with a simple problem: summing all numbers in a range. So I
used the following code:

import std.concurrency, std.stdio, std.perf;

void main() {
    immutable low = 0;
    immutable high = 1_000_000_000;

    auto c = new PerformanceCounter();

    foreach(NN; 0..10) { // 1 to 512 threads
        immutable N = 2^^NN;
        int C = 0;
        immutable step = (high-low)/N;
        writeln("# of threads: ", N);

        c.start;
        foreach(n; 0..N)
        {
            auto tid = spawn(&foo, low+n*step, low+(n+1)*step);
            tid.send(thisTid);
        }

        ulong sum;
        while(C < N) {
            auto end = receiveOnly!(bool, ulong)();
            if (end._0 == true) {
                C++;
                sum += end._1;
            }
        }
        c.stop;
        writefln("%s ms. Sum = %s", c.milliseconds,sum);
    }

    writeln("No thread: ");
    c.start;
    ulong count;
    foreach(i; low..high) count += i;
    c.stop;
    writefln("%s ms. Sum = %s", c.milliseconds,count);

//    writeln(Minus!((S!Z), S!(S!Z)).stringof);
//    writeln(Times!(S!(S!Z), S!(S!Z)).stringof);

}

void foo(int low, int high)
{
    ulong sum;
    auto msg = receiveOnly!(Tid)();
    foreach(i; low..high) sum += i;
    msg.send(true,sum);
}

It cuts the range into N parts, generate N parallel threads that will each sum
1/Nth of the range and return the result. The main thread summ it all.

Just for fun, I had to do that for N being 1, 2, 4, 8, ..., 512 threads.

The results:

the no threads version takes about 6 to 9s to complete on my machine.
The 2 to 512 threads take 1.8s repeatedly. I can understand that as this
machine has two cores.

So:

- It works! Woo hoo! Look Ma, I'm using message-passing!
- Maybe what I'm dogin is more parallelism than concurrency. But I admit being
more interested in the former than the latter.
- Why is a 2 threads version repeatedly thrice as fast as a no thread version?
I thought it'd be only twice as fast.
- It's fun to see the process running under windows: first only 50% CPU (1
thread), then it jumps to ~100%, while the memory is slowly growing. Then
brutally back to 50% CPU (no thread).
- 1024 threads are OK, but I cannot reach 2048. Why? What is the limit for the
number of spawn I can do? Would that be different if each threads spawn two
sub-threads instead of the main one generating 2048?

- What is the official way to verify that all threads are done? As you can
see, I used a while loop, but I'm reaching in the dark, there.
- I have to read TDPL again: in thread, is there any way to get the Tid of the
thread that spawned you apart from waiting for it to send its Tid?
- is there any way to broadcast a message to a list of threads?

Philippe