Windows multi-threading performance issues on multi-core systems only

dsimcha dsimcha at yahoo.com
Tue Dec 15 10:15:37 PST 2009


== Quote from Steven Schveighoffer (schveiguy at yahoo.com)'s article
> Yes, but why does multiple cores make the problem worse?  If it's the
> lock, then I'd expect just locking in multiple threads without any
> appending does worse on multiple cores than on a single core.

It does.

import std.stdio, std.perf, core.thread;

void main() {
    writeln("Set affinity, then press enter.");
    readln();

    auto pc = new PerformanceCounter;
    pc.start;

    enum nThreads = 4;
    auto threads = new Thread[nThreads];
    foreach(ref thread; threads) {
        thread = new Thread(&doStuff);
        thread.start();
    }

    foreach(thread; threads) {
        thread.join();
    }

    pc.stop;
    writeln(pc.milliseconds);
}

void doStuff() {
    foreach(i; 0..1_000_000) {
       synchronized {}
    }
}

Timing with affinity for all CPUs:  20772 ms.
Timing with affinity for 1 CPU:  156 ms.

Heavy lock contention **kills** multithreaded code b/c not only do you serialize
everything, but the OS has to perform a context switch on every contention.

I posted about a year ago that using spinlocks in the GC massively sped things up,
at least in synthetic benchmarks, if you have heavy contention and multiple cores.
 See
http://www.digitalmars.com/d/archives/digitalmars/D/More_on_GC_Spinlocks_80485.html .
 However, this post largely got ignored.

> If it's the
> lookup, why does it take longer to lookup on multiple cores?

Because appending to multiple arrays simultaneously (whether on single or multiple
cores) causes each array's append to evict the other array's append from the GC
block info cache.  If you set the affinity to only 1 CPU, this only happens once
per context switch.



More information about the Digitalmars-d mailing list