[OT] OT: Null checks.

Sun May 4 16:49:51 UTC 2025

On 5/3/2025 9:49 PM, Timon Gehr wrote:
> For the record, even if my application would not run very sluggishly when 
> compiled with DMD, in this particular case it does not matter how accurate the 
> segfault location is as I am not getting any information in the first place.

You suggested in another reply that I no experience debugging programs that do 
not have a tty attached. This is incorrect. When I've had programs "wink out" 
leaving no trail nor context behind, I add logging code. In particular, I add a 
line to the entry of key functions, something like `fprintf(log, "function 
name\n");` which appends to a log file. Examining the log file gives clues as to 
where the program was when it failed and a trail how it got there. I add more 
logging statements as needed to gradually close in to where the fault is.

> My understanding is that null pointer dereference being UB is a widespread 
> assumption in the LLVM and GCC optimizers. Simply "disabling the behavior" is 
> not practical.

It may be dependent on the optimization level.

There are all kinds of UB behavior that the LLVM and GCC optimizers just delete 
because, hey, it's undefined behavior that will never happen so it can be just 
deleted. Things like `(x + 1 > x)` being replaced with `1`. There's a way for 
the compiler to emit a warning when this is done, perhaps try that?

The compiler's ability to check at compile time for a null pointer dereference 
(and hence delete it) is extremely limited. It relies on data flow analysis 
where it can prove the pointer is null, not just "it might be". I read that 
compilers issue a warning when this is the case. Why not try `p = null; `*p = 
3;` in your setup and see if the compiler you're using gives a warning?

If it does, but the warning does not happen with your code, then you know the 
compiler is not deleting the reference, and so the CPU will check it and seg 
fault it at runtime.

> Perhaps I could add `-fsanitize=null` to add null checks, but that would not 
> really solve the main problem as it is not integrated with D scope guards.

It would at least tell you if it is a null pointer dereference or not. That in 
itself would be valuable information.

I have no idea what your program does, but I suspect your most practical option 
is to add logging to a file, and ask your customer to email the file to you when 
it crashes.

Another thing you can try is build your program with dmd and see if it behaves 
differently regarding the mysterious crash.