How to debug (potential) GC bugs?

Johannes Pfau via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Fri Oct 7 14:01:52 PDT 2016


Am Sun, 25 Sep 2016 16:23:11 +0000
schrieb Matthias Klumpp <matthias at tenstral.net>:

> Hello!
> I am working together with others on the D-based 
> appstream-generator[1] project, which is generating software 
> metadata for "software centers" and other package-manager 
> functionality on Linux distributions, and is used by default on 
> Debian, Ubuntu and Arch Linux.
> 
> For Ubuntu, some modifications on the code were needed, and 
> apparently for them the code is currently crashing in the GC 
> collection thread: http://paste.debian.net/840490/
> 
> The project is running a lot of stuff in parallel and is using 
> the GC (if the extraction is a few seconds slower due to the GC 
> being active, it doesn't matter much).
> 
> [...]
> 
> 2) How can one debug issues like the one mentioned above 
> properly? Since it seems to happen in the GC and doesn't give me 
> information on where to start searching for the issue, I am a bit 
> lost.
> 

Can you get the GDC & LDC phobos versions? 

We added shared library support in 2.068 which replaced much of
GDC-specific backported GC/TLS code with the standard upstream
implementation. So using a recent 2.068 GDC could help.

Judging from the stack trace you're probably using a 2.067 phobos:
https://github.com/D-Programming-GDC/GDC/blob/722cf5670d927ef6182bf1b72765a64ca0fde693/libphobos/libdruntime/rt/lifetime.d#L1423



Here's some advice for debugging such a problem:
The memory layout is usually deterministic when restarting the app in
gdb with the run command. So you can do this:

gdb app
# run
# SIGSEGV in ....
# bt
Then get the value of p when the app crashed, in the posted stack trace
0x7fdfae368000
# break rt_finalize2 if p = 0x7fdfae368000
# run
Should now break whenever the object is collected, so you can check if
it is collected twice. You can also use next to step until you get the
classinfo in c and then print the classinfo contents: print c 

You can also use write breakpoints to find data corruption:
find the value of pc:
# break lifetime.d:1418 if p = 0x7fdfae368000
# run
# print ppv
# watch -l pc
# or watch * (value of ppv)

then disable the old breakpoint & run from start
# disable 1
# run

This should now break when data is written to the location.

(The commands might not be 100% correct ;-)


More information about the Digitalmars-d-learn mailing list