Windows multi-threading performance issues on multi-core systems only

Michel Fortin michel.fortin at michelf.com
Tue Dec 15 18:44:42 PST 2009


On 2009-12-15 19:49:43 -0500, dsimcha <dsimcha at yahoo.com> said:

> == Quote from Simen kjaeraas (simen.kjaras at gmail.com)'s article
>> Tested this on a Core 2 Duo, same options. OS is Windows 7, 64bit. It
>> scales roughly inverse linearly with number of threads:
>> 163ms for 1,
>> 364ms for 2,
>> 886ms for 4
>> This is quite different from your numbers, though.
> 
> Yea, forgot to mention my numbers were on Win XP.  Maybe Windows 7 critical
> sections are better implemented or something.   Can a few other people with a
> variety of OS's run this benchmark and post their numbers?

Core 2 Duo / Mac OS X 10.6 / 4 threads:

	Crystal:~ mifo$ ./test
	Set affinity, then press enter.

	Bus error

Runs for about 18 seconds, then crashes. At first glance, it looks as 
if the Thread class is broken and for some reason I get a null 
dereference when a thread finishes. Great!

Anyway, I've done some sampling on the program while it runs, and each 
of the worker thread spans about 85% of its time inside _d_monitorenter 
and 11% in _d_monitorleave soon after starting the program, which later 
becomes 88% and 7% respectively soon before the program finishes.

The funny things is that if I just bypass the GC like this:

	void doAppending() {
		uint* arr = null;
		foreach(i; 0..1_000_000) {
			arr = cast(uint*)realloc(arr, (uint*).sizeof * (i+1));
			arr[i] = i;
		}
		// leak arr
	}

it finishes (I mean crashes) in less than half a second. So it looks 
like realloc does a much better job at locking it's data structure that 
the GC.

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/




More information about the Digitalmars-d mailing list