Windows multi-threading performance issues on multi-core systems only

Jacob Carlborg doob at me.com
Wed Dec 16 04:08:30 PST 2009


On 12/16/09 03:44, Michel Fortin wrote:
> On 2009-12-15 19:49:43 -0500, dsimcha <dsimcha at yahoo.com> said:
>
>> == Quote from Simen kjaeraas (simen.kjaras at gmail.com)'s article
>>> Tested this on a Core 2 Duo, same options. OS is Windows 7, 64bit. It
>>> scales roughly inverse linearly with number of threads:
>>> 163ms for 1,
>>> 364ms for 2,
>>> 886ms for 4
>>> This is quite different from your numbers, though.
>>
>> Yea, forgot to mention my numbers were on Win XP. Maybe Windows 7
>> critical
>> sections are better implemented or something. Can a few other people
>> with a
>> variety of OS's run this benchmark and post their numbers?
>
> Core 2 Duo / Mac OS X 10.6 / 4 threads:
>
> Crystal:~ mifo$ ./test
> Set affinity, then press enter.
>
> Bus error
>
> Runs for about 18 seconds, then crashes. At first glance, it looks as if
> the Thread class is broken and for some reason I get a null dereference
> when a thread finishes. Great!
>
> Anyway, I've done some sampling on the program while it runs, and each
> of the worker thread spans about 85% of its time inside _d_monitorenter
> and 11% in _d_monitorleave soon after starting the program, which later
> becomes 88% and 7% respectively soon before the program finishes.
>
> The funny things is that if I just bypass the GC like this:
>
> void doAppending() {
> uint* arr = null;
> foreach(i; 0..1_000_000) {
> arr = cast(uint*)realloc(arr, (uint*).sizeof * (i+1));
> arr[i] = i;
> }
> // leak arr
> }
>
> it finishes (I mean crashes) in less than half a second. So it looks
> like realloc does a much better job at locking it's data structure that
> the GC.
>

It runs fine on Mac OS X 10.5 with dmd 2.037.



More information about the Digitalmars-d mailing list