Windows multi-threading performance issues on multi-core systems only

Steven Schveighoffer schveiguy at yahoo.com
Tue Dec 15 06:30:05 PST 2009


On Mon, 14 Dec 2009 21:18:28 -0500, dsimcha <dsimcha at yahoo.com> wrote:

> == Quote from Dan (dsstruthers at yahoo.com)'s article
>> I have a question regarding performance issue I am seeing on multicore  
>> Windows
> systems.  I am creating many threads to do parallel tasks, and on  
> multicore
> Windows systems the performance is abysmal.  If I use task manager to  
> set the
> processor affinity to a single CPU, the program runs as I would expect.   
> Without
> that, it takes about 10 times as long to complete.
>> Am I doing something wrong?  I have tried DMD 2.0.37 and DMD 1.0.53  
>> with the
> same results, running the binary on both a dual-core P4 and a newer  
> Core2 duo.
>> Any help is greatly appreciated!
>
> I've seen this happen before.  Without knowing the details of your code,  
> my best
> guess is that you're getting a lot of contention for the GC lock.  (It  
> could also
> be some other lock, but if it were, there's a good chance you'd already  
> know it
> because it wouldn't be hidden.)  The current GC design isn't very
> multithreading-friendly yet.  It requires a lock on every allocation.
> Furthermore, the array append operator (~=) currently takes the GC lock  
> on **every
> append** to query the GC for info about the memory block that the array  
> points to.
>  There's been plenty of talk about what should be done to eliminate  
> this, but
> nothing has been implemented so far.

I would suspect something else.  I would expect actually that in an  
allocation-heavy design, running on multiple cores should be at *least* as  
fast as running on a single core.  He also only has 2 cores.  For  
splitting the parallel tasks to 2 cores to take 10x longer is very  
alarming.  I would suspect application design before the GC in this case.   
If it's a fundamental D issue, then we need to fix it ASAP, especially  
since D2 is supposed to be (among other things) an upgrade for multi-core.

Maybe I'm wrong, is there a good test case to prove it is worse on  
multiple cores?

-Steve



More information about the Digitalmars-d mailing list