Memory safe in D
Quirin Schroll
qs.il.paperinik at gmail.com
Thu Apr 25 16:01:44 UTC 2024
On Wednesday, 13 March 2024 at 06:05:35 UTC, Walter Bright wrote:
> Consider the following:
> ```
> class A { void bar(); }
>
> void foo(int i) {
> A a;
> if (i) a = new A();
> ...
> if (i) a.bar();
> }
> ```
> What happens if we apply data flow analysis to determine the
> state of `a` when it calls `bar()`? It will determine that `a`
> has the possible values (`null`, new A()`). Hence, it will give
> an error that `a` is possibly null at that point.
A type system can come to this conclusion, no control-flow
analysis needed.
> Yet the code is correct, not buggy.
Depends on what `...` does with `i`.
> Yes, the compiler could figure out that `i` is the same, but
> the conditions can be more complex such that the compiler
> cannot figure it out (the halting problem).
>
> So that doesn't work.
It seems you want the compiler not to diagnose “obvious” cases
where a null check would be superfluous. My sense is the more
formal/mathy inclined people don’t even ask for that.
> We could lower `a.bar()` to `NullCheck(a).bar()` which throws
> an exception if `a` is null. But what have we gained there?
> Nothing. The program still aborts with an exception, just like
> if the hardware checked. Except we've got this manual check
> that costs extra code and CPU time.
Please not. Just raise a compile-error.
> BTW, doing data flow analysis is very expensive in terms of
> compiler run time. The optimizer does it, but running the
> optimizer is optional for that reason.
You don’t need data flow analysis if the type system can tell
which values are potentially `null` and which aren’t.
Comprehensive example:
```d
// Using hypothetical syntax similar to C# and Kotlin
void foo(int i) {
A? a; // Change 1: Tell the type system that `a` is possibly
null
if (i) a = new A();
...
if (i) (cast(A)a).bar(); // Change 2: A cast to assert `a` is
not null.
}
```
The question is, will `i` be (effectively) changed in `...`? If
it won’t, the two `if` checks are the same and `a.bar()` would be
fine. Only the type system doesn't know. It sees a `A?` object
having a method called on it.
Going from the original code to this would happen like this:
1. Trying to compile, you get an error stating that `A a` must be
either initialized or be a nullable type (i.e. `A? a`) if `a` is
supposed to be potentially `null`. Okay, you think, it’s supposed
to be `null`, i.e. you use `A?`.
2. Trying to compile again, you get an error saying `a` cannot
have a method called on it because its type says it’s possibly
`null`, and you have to make sure somehow that it won’t be:
Options are:
* Use `a?.bar()` which only calls `bar` if `a` isn’t null.
* Use `a!.bar()` which asserts (throwing an Error) that `a`
isn’t null and then calls `bar`.
* Use an explicit cast, which just silences the error, i.e.
inserts no check, meaning it gives you a segfault if `a` is null.
3. So you insert a cast, assuming you don’t meaningfully touch
`i` in `...`.
You’re absolutely right that a segfault is infinitely better than
UB, but a type system that catches potential/likely segfaults
before the program even runs once is infinitely better than
segfaults. In this example, if `...` does change to `i`, the cast
is ill-posed and you go back to segfault land. If you’re unsure,
`a!.bar()` is probably better. It’s definitely safer. (And, for
the segfault land enthusiasts, we can add syntax sugar for the
cast: `cast(!null)` for cases when the type is not known or
needlessly long.
What you do need control-flow analysis for is if you want to
avoid those casts in “obvious” cases (e.g. if the `...` doesn’t
write to `i`). What doesn’t need control-flow analysis are
language constructs that the compiler recognizes as null checks.
Just imagine for a moment D had `final` as a type constructor
which for classes means head-const. On `if (a !is null)` the
compiler can add a new variable `A __a = cast(A)a;` and attempt
to use `__a` instead of `a` in the code block. If it succeeds,
`a` isn’t possibly reassigned `null` and therefore stays
non-null. No additional clutter needed in the block. Recognizing
this pattern isn’t that hard I hope. It’s definitely less
expensive than control-flow analysis.
More information about the Digitalmars-d
mailing list