Issues with debugging GC-related crashes #2

Matthias Klumpp mak at debian.org
Fri Apr 20 00:11:25 UTC 2018


On Thursday, 19 April 2018 at 18:45:41 UTC, kinke wrote:
> On Thursday, 19 April 2018 at 17:01:48 UTC, Matthias Klumpp 
> wrote:
>> Something that maybe is relevant though: I occasionally get 
>> the following SIGABRT crash in the tool on machines which have 
>> the SIGSEGV crash:
>> ```
>> Thread 53 "appstream-gener" received signal SIGABRT, Aborted.
>> [Switching to Thread 0x7fdfe98d4700 (LWP 7326)]
>> 0x00007ffff5040428 in __GI_raise (sig=sig at entry=6) at 
>> ../sysdeps/unix/sysv/linux/raise.c:54
>> 54      ../sysdeps/unix/sysv/linux/raise.c: No such file or 
>> directory.
>> (gdb) bt
>> #0  0x00007ffff5040428 in __GI_raise (sig=sig at entry=6) at 
>> ../sysdeps/unix/sysv/linux/raise.c:54
>> #1  0x00007ffff504202a in __GI_abort () at abort.c:89
>> #2  0x0000000000780ae0 in core.thread.Fiber.allocStack(ulong, 
>> ulong) (this=0x7fde0758a680, guardPageSize=4096, sz=20480) at 
>> src/core/thread.d:4606
>> #3  0x00000000007807fc in 
>> _D4core6thread5Fiber6__ctorMFNbDFZvmmZCQBlQBjQBf 
>> (this=0x7fde0758a680, guardPageSize=4096, sz=16384, dg=...)
>>     at src/core/thread.d:4134
>> #4  0x00000000006f9b31 in 
>> _D3std11concurrency__T9GeneratorTAyaZQp6__ctorMFDFZvZCQCaQBz__TQBpTQBiZQBx (this=0x7fde0758a680, dg=...)
>>     at 
>> /home/ubuntu/dtc/dmd/generated/linux/debug/64/../../../../../druntime/import/core/thread.d:4126
>> #5  0x00000000006e9467 in 
>> _D5asgen8handlers11iconhandler5Theme21matchingIconFilenamesMFAyaSQCl5utils9ImageSizebZC3std11concurrency__T9GeneratorTQCfZQp (this=0x7fdea2747800, relaxedScalingRules=true, size=..., iname=...) at ../src/asgen/handlers/iconhandler.d:196
>> #6  0x00000000006ea75a in 
>> _D5asgen8handlers11iconhandler11IconHandler21possibleIconFilenamesMFAyaSQCs5utils9ImageSizebZ9__lambda4MFZv (this=0x7fde0752bd00)
>>     at ../src/asgen/handlers/iconhandler.d:392
>> #7  0x000000000082fdfa in core.thread.Fiber.run() 
>> (this=0x7fde07528580) at src/core/thread.d:4436
>> #8  0x000000000082fd5d in fiber_entryPoint () at 
>> src/core/thread.d:3665
>> #9  0x0000000000000000 in  ()
>> ```
>
> You probably already figured that the new Fiber seems to be 
> allocating its 16KB-stack, with an additional 4 KB guard page 
> at its bottom, via a 20 KB mmap() call. The abort seems to be 
> triggered by mprotect() returning -1, i.e., a failure to 
> disallow all access to the the guard page; so checking `errno` 
> should help.

Jup, I did that already, it just took a really long time to run 
because when I made the change to print errno I also enabled 
detailed GC profiling (via the PRINTF* debug options). Enabling 
the INVARIANT option for the GC is completely broken by the way, 
I enforced the compile to work by casting to shared, with the 
result of the GC locking up forever at the start of the program.

Anyway, I think for a chance I actually produced some useful 
information via the GC debug options:
Given the following crash:
```
#0  0x00000000007f1d94 in 
_D2gc4impl12conservativeQw3Gcx4markMFNbNlPvQcZv (this=..., 
ptop=0x7fdfce7fc010, pbot=0x7fdfcdbfc010)
     at src/gc/impl/conservative/gc.d:1990
         p1 = 0x7fdfcdbfc010
         p2 = 0x7fdfce7fc010
         stackPos = 0
[...]
```
The scanned range seemed fairly odd to me, so I searched for it 
in the (very verbose!) GC debug output, which yielded:
```
235.244445: 0xc4f090.Gcx::addRange(0x8264230, 0x8264270)
235.244460: 0xc4f090.Gcx::addRange(0x7fdfcdbfc010, 0x7fdfce7fc010)
235.253861: 0xc4f090.Gcx::addRange(0x8264300, 0x8264340)
235.253873: 0xc4f090.Gcx::addRange(0x8264390, 0x82643d0)
```
So, something is calling addRange explicitly there, causing the 
GC to scan a range that it shouldn't scan. Since my code doesn't 
add ranges to the GC, and I looked at the generated code from 
girtod/GtkD and it very much looks fine to me, I am currently 
looking into EMSI containers[1] as the possible culprit.
That library being the issue would also make perfect sense, 
because this issue started to appear with such a frequency only 
after containers were added (there was a GC-related crash before, 
but that might have been a different one).

So, I will look into that addRange call next.

[1]: https://github.com/dlang-community/containers



More information about the Digitalmars-d mailing list