[phobos] FreeBSD segfaults with std.parallelism
David Simcha
dsimcha at gmail.com
Fri Apr 29 06:26:38 PDT 2011
I've spent some serious time looking into the FreeBSD std.parallelism
segfaults. I'm at a complete loss as to what could be causing them or
how to fix them. Here are some observations. Someone please offer any
suggestions you have.
1. I'm able to reproduce these, though much more sporadically, on
Windows and Linux, by executing the unit test in a loop.
2. On FreeBSD running GDB on the core dump shows stack traces that
should be impossible. Every time the program crashes, the function at
the top of the stack should be unreachable from the second function from
the top. (It shouldn't even be indirectly reachable, i.e. inlining
couldn't explain it.) On both Linux and FreeBSD, the program counter
ends up at illegal places in between instructions. Even more weirdly,
the address that the program counter ends up at when the segfault
happens seems deterministic for any given platform and compiler
settings. Is there a good debugger for Windows that will give me stack
traces and stuff like GDB?
3. The triggering test is:
auto lmchain = poolInstance.map!"a * a"(
poolInstance.map!sqrt(
poolInstance.asyncBuf(
iota(3_000_000)
)
)
);
foreach(i, elem; parallel(lmchain)) {
assert(approxEqual(elem, i));
}
In other words, it's the test that uses everything together (including
Task and amap() under the hood), the hardest one to debug.
IIUC, the instruction stream can't be overwritten by a buggy program
because the code pages are marked read-only. The only other explanation
I can think of for how the program counter could be corrupted is if some
race condition corrupts either a function pointer or a return address on
the stack. However, in this case the address that the program counter
ends up at when the segfault happens should be less deterministic.
More information about the phobos
mailing list