More on Multithreading Performance
dsimcha
dsimcha at yahoo.com
Wed Dec 16 18:12:12 PST 2009
Our multithreading performance problems can probably be mitigated, at least on
Windows, by using InitializeCriticalSectionAndSpinCount instead of
InitializeCriticalSection to implement synchronized blocks. According to
http://msdn.microsoft.com/en-us/library/ms683476%28VS.85%29.aspx this causes
the waiting thread to spin a specified amount of times before being context
switched, but only on multiprocesser computers.
This seems like a no-brainer for the GC lock. Having a small amount of
spinning before the context switch also seems like a pretty good default for
synchronized blocks in general. People who really want to customize things
like this will use something something more customizable than a plain old
synchronized block.
Here's a test program that measures the speed-up.
import core.thread, std.stdio, std.perf, core.sys.windows.windows, std.conv,
std.string;
extern(Windows) BOOL InitializeCriticalSectionAndSpinCount(CRITICAL_SECTION*,
DWORD);
enum nThreads = 2;
__gshared int num = 0;
__gshared CRITICAL_SECTION lock;
void main(string[] args) {
stderr.writeln("Give me a spin count.");
int spinCount = to!int( readln().strip() );
InitializeCriticalSectionAndSpinCount(&lock, spinCount);
auto pc = new PerformanceCounter;
pc.start;
auto threads = new Thread[nThreads];
for(int i = 0; i < nThreads; i++) {
threads[i] = new Thread(&doStuff);
threads[i].start();
}
foreach(thread; threads) {
thread.join();
}
pc.stop;
writeln(pc.milliseconds);
}
void doStuff() {
for(int i = 0; i < 10_000_000; i++) {
EnterCriticalSection(&lock);
LeaveCriticalSection(&lock);
}
}
spin count = 0: 3843 ms
spin count = 4000: 2095 ms
core.sync.Mutex doesn't use this feature. Neither do synchronized blocks.
Based on looking at the source files for these, it seems trivial to start
using them. Anyone see a good reason not to, or should I Bugzilla/patch this one?
More information about the Digitalmars-d
mailing list