Windows multi-threading performance issues on multi-core systems only
Steven Schveighoffer
schveiguy at yahoo.com
Tue Dec 15 06:30:05 PST 2009
On Mon, 14 Dec 2009 21:18:28 -0500, dsimcha <dsimcha at yahoo.com> wrote:
> == Quote from Dan (dsstruthers at yahoo.com)'s article
>> I have a question regarding performance issue I am seeing on multicore
>> Windows
> systems. I am creating many threads to do parallel tasks, and on
> multicore
> Windows systems the performance is abysmal. If I use task manager to
> set the
> processor affinity to a single CPU, the program runs as I would expect.
> Without
> that, it takes about 10 times as long to complete.
>> Am I doing something wrong? I have tried DMD 2.0.37 and DMD 1.0.53
>> with the
> same results, running the binary on both a dual-core P4 and a newer
> Core2 duo.
>> Any help is greatly appreciated!
>
> I've seen this happen before. Without knowing the details of your code,
> my best
> guess is that you're getting a lot of contention for the GC lock. (It
> could also
> be some other lock, but if it were, there's a good chance you'd already
> know it
> because it wouldn't be hidden.) The current GC design isn't very
> multithreading-friendly yet. It requires a lock on every allocation.
> Furthermore, the array append operator (~=) currently takes the GC lock
> on **every
> append** to query the GC for info about the memory block that the array
> points to.
> There's been plenty of talk about what should be done to eliminate
> this, but
> nothing has been implemented so far.
I would suspect something else. I would expect actually that in an
allocation-heavy design, running on multiple cores should be at *least* as
fast as running on a single core. He also only has 2 cores. For
splitting the parallel tasks to 2 cores to take 10x longer is very
alarming. I would suspect application design before the GC in this case.
If it's a fundamental D issue, then we need to fix it ASAP, especially
since D2 is supposed to be (among other things) an upgrade for multi-core.
Maybe I'm wrong, is there a good test case to prove it is worse on
multiple cores?
-Steve
More information about the Digitalmars-d
mailing list