Problem with GC and address/leak sanitizer

Sat Mar 29 00:57:34 UTC 2025

Hi Luis,
   Thanks for spending some time on it.
Here some inline remarks from my side.

On Friday, 28 March 2025 at 00:35:50 UTC, Luís Marques wrote:
>
> 1. **The main issue:** The distributed 
> `libdruntime-ldc-shared.so` appears to be compiled without 
> `version=SupportSanitizers`, so the fake stack scanning isn't 
> enabled.

This is intentional, because there is some overhead associated 
with it. If this is the main issue then that's "OK" and has 
always been the case. Perhaps FakeStack is now enabled by 
default. We should either disable that default, or start shipping 
including the small overhead in druntime.

> 2. **Fake stack support should work when properly enabled:** If 
> you build LDC with `-DRT_SUPPORT_SANITIZERS=True`, the fake 
> stack is scanned as expected.

Great.

> From what I can see (by checking for the proper GC scan calls 
> and addresses), everything is being done correctly. The test 
> `tests/sanitizers/asan_fakestack_GC.d` still fails, but not 
> because of broken fake stack scanning—instead, the test itself 
> is broken. An assertion fails in 
> `test_non_null_does_not_trigger_collection` because memory from 
> a previous iteration is being freed, not because it's freeing 
> memory incorrectly for the current iteration.

The idea of the test is to test that enabling fakestack does not 
change collection behavior. The test is clunky, I'll admit. If it 
is indeed failing already on the fakestack-disabled case, then it 
needs to be adjusted for sure.

> 3. **Redundant function calls:** The runtime function 
> `scanAllTypeImpl` calls `scanStackForASanFakeStack` multiple 
> times for the same fake stack frame, using identical memory 
> ranges each time. While technically correct, this is 
> unnecessary and inefficient. Based on the comment `"This is the 
> pointer we are catching with"`, it seems there's a 
> misunderstanding regarding how the fake stack should be located.

I'm not sure how this can be related to specifically fakestack 
implementation issue. If you mean that `scanAllTypeImpl` calls 
`scanStackForASanFakeStack` multiple times, then that is not a 
problem of our fakestack support code, but of `scanAllTypeImpl` 
itself because it would also call the normal `scan` method 
multiple times with same memory range. That would point towards 
an issue with duplicates in the `ThreadBase.sm_cbeg` list.

Is that what you meant?  (it is literally what you typed, but the 
sentences following it do not make sense with that 
interpretation...)

LLVM does not give us many options to find the fake stack. At the 
time I think I used the only public API method available, which 
indeed results in multiple times getting the same fakestack. We 
can poke into ASan internals but I am afraid it will quickly 
bitrot/break. Perhaps by now ASan's public interface has been 
extended, otherwise we should ask for it (or submit PRs to 
LLVM)...
I'm quite sure we are doing the right thing of locating the 
fakestack, it's done similarly here: 
https://www.aovivo.ilutas.com.br/node-v17.9.1/deps/v8/src/heap/base/stack.cc

> 4. **Misleading comment in the test:** The comment `// Large 
> enough so it will be on fakestack heap (not inlined in local 
> stack frame)` in `asan_fakestack_GC.d` appears to be incorrect. 
> The size of the allocation shouldn't determine this—if its 
> address escapes, it must go into the heap-allocated fake stack 
> regardless.

I vaguely remember there being some optimization with small 
memory sizes, but I cannot find it and cannot trigger it anymore. 
I think it may be related to this comment of mine: 
https://johanengelen.github.io/posts/2017-12-25-ldc-and-addresssanitizer/#fn:4
Hmm... I no longer remember. Regardless, you are right that to 
detect any stackmemory use after return, that memory cannot sit 
on the regular processor stack.

cheers,
   Johan