std.parallelism: Request for Review
dsimcha
dsimcha at yahoo.com
Sun Feb 27 08:36:02 PST 2011
On 2/27/2011 9:48 AM, dsimcha wrote:
> On 2/27/2011 8:03 AM, Russel Winder wrote:
>> 32-bit mode on a 8-core (twin Xeon) Linux box. That core.cpuid bug
>> really, really sucks.
>>
>> I see matrix inversion takes longer with 4 cores than with 1!
>
Actually, I am able to reproduce this, but only on Linux, and I think I
figured out why. I think it's related to my Posix workaround for Bug
3753 (http://d.puremagic.com/issues/show_bug.cgi?id=3753). This
workaround causes GC heap allocations to occur in a loop inside the
matrix inversion routine (one for each call to parallel(), so 256 over
the course of the benchmark). This was intended to be a very quick and
dirty workaround for a DMD bug that I thought would get fixed a long
time ago. It also seemed good enough at the time because I was using
this lib for very coarse grained parallelism, where the effect is
negligible.
Originally, I was using alloca() all over the place to efficiently deal
with memory management. However, under Posix, I ran into Bug 3753 a
long time ago and put in the following workaround, which simply forwards
alloca() calls to the GC. From near the top of parallelism.d:
// Workaround for bug 3753.
version(Posix) {
// Can't use alloca() because it can't be used with exception
// handling.
// Use the GC instead even though it's slightly less efficient.
void* alloca(size_t nBytes) {
return GC.malloc(nBytes);
}
} else {
// Can really use alloca().
import core.stdc.stdlib : alloca;
}
In this particular use case the performance hit is probably substantial.
There are ways to mitigate it (maybe having TaskPool maintain a free
list, etc.), but I can't bring myself to put a lot of effort into
optimizing a workaround for a compiler bug.
More information about the Digitalmars-d
mailing list