Issues with debugging GC-related crashes #2

Matthias Klumpp mak at debian.org
Wed Apr 18 17:40:56 UTC 2018


On Wednesday, 18 April 2018 at 10:15:49 UTC, Kagamin wrote:
> You can call GC.collect at some points in the program to see if 
> they can trigger the crash

I already do that, and indeed I get crashes. I could throw those 
calls into every function though, or make a minimal pool size, 
maybe that yields something...

> https://dlang.org/library/core/memory/gc.collect.html
> If you link against debug druntime, GC can check invariants for 
> correctness of its structures. There's a number of debugging 
> options for GC, though not sure which ones are enabled in 
> default debug build of druntime: 
> https://github.com/ldc-developers/druntime/blob/ldc/src/gc/impl/conservative/gc.d#L1388

I get compile errors for the INVARIANT option, and I don't 
actually know how to deal with those properly:
```
src/gc/impl/conservative/gc.d(1396): Error: shared mutable method 
core.internal.spinlock.SpinLock.lock is not callable using a 
shared const object
src/gc/impl/conservative/gc.d(1396):        Consider adding const 
or inout to core.internal.spinlock.SpinLock.lock
src/gc/impl/conservative/gc.d(1403): Error: shared mutable method 
core.internal.spinlock.SpinLock.unlock is not callable using a 
shared const object
src/gc/impl/conservative/gc.d(1403):        Consider adding const 
or inout to core.internal.spinlock.SpinLock.unlock
```

Commenting out the locks (eww!!) yields no change in behavior 
though.

The crashes always appear in 
https://github.com/dlang/druntime/blob/master/src/gc/impl/conservative/gc.d#L1990

Meanwhile, I also tried to reproduce the crash locally in a 
chroot, with no result. All libraries used between the machine 
where the crashes occur and my local machine were 100% identical, 
the only differences I am aware of are obviously the hardware 
(AWS cloud vs. home workstation) and the Linux kernel (4.4.0 vs 
4.15.0)

The crash happens when built with LDC or DMD, that doesn't 
influence the result. Copying over a binary from the working 
machine to the crashing one also results in the same errors.

I am completely out of ideas here. Since I think I can rule out a 
hardware fault at Amazon, I don't even know what else would make 
sense to try.


More information about the Digitalmars-d mailing list