Windows multi-threading performance issues on multi-core systems only
zsxxsz
zhengshuxin at hexun.com
Tue Dec 15 00:26:11 PST 2009
== Quote from dsimcha (dsimcha at yahoo.com)'s article
> == Quote from Dan (dsstruthers at yahoo.com)'s article
> > I have a question regarding performance issue I am seeing on multicore Windows
> systems. I am creating many threads to do parallel tasks, and on multicore
> Windows systems the performance is abysmal. If I use task manager to set the
> processor affinity to a single CPU, the program runs as I would expect. Without
> that, it takes about 10 times as long to complete.
> > Am I doing something wrong? I have tried DMD 2.0.37 and DMD 1.0.53 with the
> same results, running the binary on both a dual-core P4 and a newer Core2 duo.
> > Any help is greatly appreciated!
> I've seen this happen before. Without knowing the details of your code, my best
> guess is that you're getting a lot of contention for the GC lock. (It could also
> be some other lock, but if it were, there's a good chance you'd already know it
> because it wouldn't be hidden.) The current GC design isn't very
> multithreading-friendly yet. It requires a lock on every allocation.
> Furthermore, the array append operator (~=) currently takes the GC lock on **every
> append** to query the GC for info about the memory block that the array points to.
> There's been plenty of talk about what should be done to eliminate this, but
> nothing has been implemented so far.
> Assuming I am right about why your code is so slow, here's how to deal with it:
> 1. Cut down on unnecessary memory allocations. Use structs instead of classes
> where it makes sense.
> 2. Try to stack allocate stuff. alloca is your friend.
> 3. Pre-allocate arrays if you know ahead of time how long they're supposed to be.
> If you don't know how long they're supposed to be, use std.array.Appender (in D2)
> for now until a better solution gets implemented. Never use ~= in multithreaded
> code that gets executed a lot.
Yes, I've seen this before, too. But in my muti-threads, the alloc operations
aren't avoiding, so the D's GC should improve it's performance for multi-threads.
More information about the Digitalmars-d
mailing list