Windows multi-threading performance issues on multi-core systems only

dsimcha dsimcha at yahoo.com
Mon Dec 14 18:18:28 PST 2009


== Quote from Dan (dsstruthers at yahoo.com)'s article
> I have a question regarding performance issue I am seeing on multicore Windows
systems.  I am creating many threads to do parallel tasks, and on multicore
Windows systems the performance is abysmal.  If I use task manager to set the
processor affinity to a single CPU, the program runs as I would expect.  Without
that, it takes about 10 times as long to complete.
> Am I doing something wrong?  I have tried DMD 2.0.37 and DMD 1.0.53 with the
same results, running the binary on both a dual-core P4 and a newer Core2 duo.
> Any help is greatly appreciated!

I've seen this happen before.  Without knowing the details of your code, my best
guess is that you're getting a lot of contention for the GC lock.  (It could also
be some other lock, but if it were, there's a good chance you'd already know it
because it wouldn't be hidden.)  The current GC design isn't very
multithreading-friendly yet.  It requires a lock on every allocation.
Furthermore, the array append operator (~=) currently takes the GC lock on **every
append** to query the GC for info about the memory block that the array points to.
 There's been plenty of talk about what should be done to eliminate this, but
nothing has been implemented so far.

Assuming I am right about why your code is so slow, here's how to deal with it:

1.  Cut down on unnecessary memory allocations.  Use structs instead of classes
where it makes sense.

2.  Try to stack allocate stuff.  alloca is your friend.

3.  Pre-allocate arrays if you know ahead of time how long they're supposed to be.
 If you don't know how long they're supposed to be, use std.array.Appender (in D2)
for now until a better solution gets implemented.  Never use ~= in multithreaded
code that gets executed a lot.



More information about the Digitalmars-d mailing list