NonNull template

Mon Apr 21 17:29:58 UTC 2025

On Saturday, 19 April 2025 at 22:49:19 UTC, Jonathan M Davis 
wrote:
> On Thursday, April 17, 2025 8:39:27 PM MDT Walter Bright via 
> Digitalmars-d wrote:
>> I'd like to know what those gdc and ldc transformations are, 
>> and whether they are controllable with a switch to their 
>> optimizers.
>>
>> I know there's a problem with WASM not faulting on a null 
>> dereference, but in another post I suggested a way to deal 
>> with it.
>
> Unfortunately, my understanding isn't good enough to explain 
> those details. I discussed it with Johan in the past, but I've 
> never worked on ldc or with llvm (or on gdc/gcc), so I really 
> don't know what is or isn't possible. However, from what I 
> recall of what Johan said, we were kind of stuck, and llvm 
> considered dereferencing null to be undefined behavior.

There is a way now to tell LLVM that dereferencing null is 
_defined_ (nota bene) behavior.

> It may be the case that there's some sort of way to control 
> that (and llvm may have more capabilities in that regard since 
> I last discussed it with Johan), but someone who actually knows 
> llvm is going to have to answer those questions. And I don't 
> know how gdc's situation differs either.

So far not responded in this thread because I feel it is an old 
discussion, with old misunderstandings.

There is confusion between dereferencing in the language, versus 
dereferencing by the CPU. What I think that C and C++ do very 
well is separate language behavior from implementation/CPU 
behavior, and only prescribe language behavior, no (or very 
little) implementation behavior. I feel D should do the same.

Non-virtual method example, where (in my opinion) the dereference 
happens at call site, not inside the function:

```
class A {
    int a;
    final void foo() { // non-virtual
       a = 1; // no dereference here
    }
}

A a;
a.foo();  <--  DEREFERENCE
```

During program execution, _with the current D implementation of 
classes and non-virtual methods_, the CPU will only "dereference" 
the `this` pointer to do the assignment to `a`. But that is only 
the case for our _current implementation_. For the D language 
behavior, it does not matter what the implementation does: same 
behavior should happen on any architecture/platform/execution 
model.

If you want to fault on null-dereference, I believe you _have_ to 
add a null-check at every dereference at _language_ level 
(regardless of implementation details). Perhaps it does not 
impact performance very much (with optimizer enabled); I vaguely 
remember a paper from Microsoft where they tried this and did not 
see a big perf impact (if any).

Some notes to trigger you to think about distinguishing language 
behavior from CPU/implementation details:

- You don't _have_ to implement classes and virtual functions 
using a vptr/vtable, there are other options!
- There does not need to be a "stack" (implementation detail 
vocabulary). Some "CPUs" don't have a "stack", and instead do 
"local storage" (language vocabulary) in an alternative way. In 
fact, even on CPUs _with_ stack, it can help to not use it! (read 
about Address Sanitizer detection of stack-use-after-scope and 
ASan's "fake stack")
- Pointers don't have to be memory addresses (you probably 
already know that they are not physical addresses on common 
CPUs), but could probably be implemented as hashes/keys into a 
database as well. C does not define ordered comparison (e.g. > 
and <) for pointers (it's implementation defined, IIRC), except 
when they point into the same object (e.g. an array or struct). 
Why? Because what does it mean on segmented memory architectures 
(i.e. x86)?
- Distinguishing language from implementation behavior means that 
correct programs work the same on all kinds of different 
implementations (e.g. you can run your C++ program in a REPL, or 
run it in your browser through WASM).

cheers,
   Johan