Problem with GC and address/leak sanitizer

Sat Feb 15 23:31:42 UTC 2025

I have a program where the GC *seems* to be overwriting memory 
still in use and corrupting data.

Here's the code. It's massively reduced from the original 
program. It's hard to reduce it further because minor changes can 
prevent the problem from triggering. I'll explain below the 
important parts.

```d
import std.stdio;

struct S {
     int check;
     S* next;
     int[4] data;
}

int main(string[] args) {
     void*[] allocs;
     enum bad_iter = 268;
     for (int n = 0; n < bad_iter+1; n++) {
         allocs.length = 0;
         auto x = "                   ";
         x ~= ' ';

         int[10][] ts;
         for(int i = 0; i < 21; i++) {
             ts.length++;
         }

         S head;
         S* s = &head;
         if (n == bad_iter) {
             n = bad_iter; // convenient line to set a breakpoint 
only for the last iteration
         }
         for(int i = 0; i < 8; i++) {
             auto ns = new S;
             ns.check = 1; // set test value here
             s.next = ns;
             s = ns;
         }
         s = head.next; // get the first S allocated this iteration
         if (s.check != 1) { // check test value here
             writefln("check=%d", s.check);
             return -1;
         }

         new int[10];
         allocs ~= null;
         new size_t[3];
     }
     return 0;
}
```

The important part is the following. On each iteration we create 
8 instances of S. For each S value, we set its `check` field to 
1. Then we check the value of that field (for the first instance 
of S). When compiled with the address sanitizer, we observe it's 
been corrupted and it's no longer 1.

Am I doing something incorrectly in the code? AFAIK I'm 
respecting the rules required by the GC. Maybe there's a silly 
bug I overlooked?

Tested with LDC 1.40.0 on x86_64 Linux:

```
$ ldc2 app.d -g --frame-pointer=all && ./app # OK
$ ldc2 app.d -fsanitize=address -g --frame-pointer=all && ./app # 
BUG
check=-337690816
$
```

By setting a watchpoint on the address of the field, I see that 
the code that writes to `check` is part of the GC implementation. 
Here's the backtrace:

```
* thread #1, name = 'app', stop reason = watchpoint 1
   * frame #0: 0x00007ffff7f4695c 
libdruntime-ldc-shared.so.110`_D4core8internal2gc4impl12conservativeQw3Gcx15recoverNextPageMFNbEQCmQCkQCeQCeQCcQCn4BinsZb + 348
     frame #1: 0x00007ffff7f46278 
libdruntime-ldc-shared.so.110`_D4core8internal2gc4impl12conservativeQw3Gcx10smallAllocMFNbmKmkxC8TypeInfoZPv + 776
     frame #2: 0x00007ffff7f417d9 
libdruntime-ldc-shared.so.110`_D4core8internal2gc4impl12conservativeQw14ConservativeGC__T9runLockedS_DQCsQCqQCkQCkQCiQCtQBy12mallocNoSyncMFNbmkKmxC8TypeInfoZPvS_DQFaQEyQEsQEsQEqQFb10mallocTimelS_DQGiQGgQGaQGaQFyQGj10numMallocslTmTkTmTxQDlZQFuMFNbKmKkKmKxQEeZQDx + 89
     frame #3: 0x00007ffff7f449d3 
libdruntime-ldc-shared.so.110`_DThn16_4core8internal2gc4impl12conservativeQw14ConservativeGC6qallocMFNbmkMxC8TypeInfoZSQDd6memory8BlkInfo_ + 83
     frame #4: 0x00007ffff7f4ddec 
libdruntime-ldc-shared.so.110`gc_qalloc + 28
     frame #5: 0x000055555556be9a 
app`_D4core8lifetime__T11_d_newitemTTS3app1SZQwFNaNbNeZPQt at 
lifetime.d:2837:5
     frame #6: 0x000055555556b745 app`D main(args=string[] @ 
0x00007fffffffe438) at app.d:28:13
     frame #7: 0x00007ffff7f68ecd 
libdruntime-ldc-shared.so.110`_D2rt6dmain212_d_run_main2UAAamPUQgZiZ6runAllMFZv + 77
     frame #8: 0x00007ffff7f68ce7 
libdruntime-ldc-shared.so.110`_d_run_main2 + 407
     frame #9: 0x00007ffff7f68b3d 
libdruntime-ldc-shared.so.110`_d_run_main + 141
     frame #10: 0x000055555556c2b2 app`main(argc=1, 
argv=0x00007fffffffe728) at entrypoint.d:42:17
     frame #11: 0x00007ffff7745e08 
libc.so.6`__libc_start_call_main(main=(app`main at 
entrypoint.d:39), argc=1, argv=0x00007fffffffe728) at 
libc_start_call_main.h:58:16
     frame #12: 0x00007ffff7745ecc 
libc.so.6`__libc_start_main_impl(main=(app`main at 
entrypoint.d:39), argc=1, argv=0x00007fffffffe728, 
init=<unavailable>, fini=<unavailable>, rtld_fini=<unavailable>, 
stack_end=0x00007fffffffe718) at libc-start.c:360:3
     frame #13: 0x000055555556b3a5 app`_start + 37
```

There is a subsequent write to that memory location in the leak 
sanitizer and LSan complains:

`==4056526==LeakSanitizer has encountered a fatal error.`  
(though usually this message isn't flushed)

I assume the original problem was caused by the GC and ASan/LSan 
are just subsequent victims, but it's hard to be sure. 
Apparently, LSan is automatically enabled for Linux when ASan is 
used. Although the ASan documentation says that LSan "can be 
enabled using `ASAN_OPTIONS=detect_leaks=1` on macOS", setting 
that to 0 didn't seem to disable it, so I couldn't test with ASan 
but not LSan.

Any ideas of what might be going on?