[phobos] FreeBSD segfaults with std.parallelism

Fri Apr 29 06:26:38 PDT 2011

I've spent some serious time looking into the FreeBSD std.parallelism 
segfaults.  I'm at a complete loss as to what could be causing them or 
how to fix them.  Here are some observations.  Someone please offer any 
suggestions you have.

1.  I'm able to reproduce these, though much more sporadically, on 
Windows and Linux, by executing the unit test in a loop.

2.  On FreeBSD running GDB on the core dump shows stack traces that 
should be impossible.  Every time the program crashes, the function at 
the top of the stack should be unreachable from the second function from 
the top.  (It shouldn't even be indirectly reachable, i.e. inlining 
couldn't explain it.)  On both Linux and FreeBSD, the program counter 
ends up at illegal places in between instructions.  Even more weirdly, 
the address that the program counter ends up at when the segfault 
happens seems deterministic for any given platform and compiler 
settings.  Is there a good debugger for Windows that will give me stack 
traces and stuff like GDB?

3.  The triggering test is:

     auto lmchain = poolInstance.map!"a * a"(
         poolInstance.map!sqrt(
             poolInstance.asyncBuf(
                 iota(3_000_000)
             )
         )
     );
     foreach(i, elem; parallel(lmchain)) {
         assert(approxEqual(elem, i));
     }

In other words, it's the test that uses everything together (including 
Task and amap() under the hood), the hardest one to debug.

IIUC, the instruction stream can't be overwritten by a buggy program 
because the code pages are marked read-only.  The only other explanation 
I can think of for how the program counter could be corrupted is if some 
race condition corrupts either a function pointer or a return address on 
the stack.  However, in this case the address that the program counter 
ends up at when the segfault happens should be less deterministic.