[phobos] FreeBSD segfaults with std.parallelism

Fri Apr 29 06:41:44 PDT 2011

I'll mention that I just was debugging some stuff in Linux for dcollections, and the line numbers for the unit tests given by gdb were off by a couple of lines (I had the same feelings you are having, how is this possible).  That was just inside the unit tests.  In the actual code, the line numbers were correct.  What's more, when I would step over lines in gdb, the program would jump *back* some lines when there was no loop in the code.

I wouldn't trust that GDB is giving you accurate info.

My recommendation -- try using writeln debugging if possible.

-Steve

>________________________________
>From: David Simcha <dsimcha at gmail.com>
>To: Discuss the phobos library for D <phobos at puremagic.com>
>Sent: Friday, April 29, 2011 9:26 AM
>Subject: Re: [phobos] FreeBSD segfaults with std.parallelism
>
>I've spent some serious time looking into the FreeBSD std.parallelism segfaults.  I'm at a complete loss as to what could be causing them or how to fix them.  Here are some observations.  Someone please offer any suggestions you have.
>
>1.  I'm able to reproduce these, though much more sporadically, on Windows and Linux, by executing the unit test in a loop.
>
>2.  On FreeBSD running GDB on the core dump shows stack traces that should be impossible.  Every time the program crashes, the function at the top of the stack should be unreachable from the second function from the top.  (It shouldn't even be indirectly reachable, i.e. inlining couldn't explain it.)  On both Linux and FreeBSD, the program counter ends up at illegal places in between instructions.  Even more weirdly, the address that the program counter ends up at when the segfault happens seems deterministic for any given platform and compiler settings.  Is there a good debugger for Windows that will give me stack traces and stuff like GDB?
>
>3.  The triggering test is:
>
>    auto lmchain = poolInstance.map!"a * a"(
>        poolInstance.map!sqrt(
>            poolInstance.asyncBuf(
>                iota(3_000_000)
>            )
>        )
>    );
>    foreach(i, elem; parallel(lmchain)) {
>        assert(approxEqual(elem, i));
>    }
>
>In other words, it's the test that uses everything together (including Task and amap() under the hood), the hardest one to debug.
>
>IIUC, the instruction stream can't be overwritten by a buggy program because the code pages are marked read-only.  The only other explanation I can think of for how the program counter could be corrupted is if some race condition corrupts either a function pointer or a return address on the stack.  However, in this case the address that the program counter ends up at when the segfault happens should be less deterministic.
>
>_______________________________________________
>phobos mailing list
>phobos at puremagic.com
>http://lists.puremagic.com/mailman/listinfo/phobos
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/phobos/attachments/20110429/5923fd75/attachment-0001.html>