Null-checked reference types

Tue Aug 13 10:33:01 UTC 2024

On Monday, 12 August 2024 at 12:02:52 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
> On 12/08/2024 10:02 PM, Quirin Schroll wrote:
>>>>> > No data flow analysis is proposed. Null checking is local
>>>>> > > and
>>>>> done by tracking ? and ! by the type system.
>>>>>
>>>>> DFA is only required if you want the type state to change 
>>>>> as the function is interpreted. So that's fine. That is a 
>>>>> me thing to figure out.
>>>>
>>>> If I understand correctly, by “type state” you means 
>>>> something like value range propagation. It basically *is* 
>>>> value range propagation, however the ranges in question are 
>>>> `null` and all non-null values. You don’t suggest `typeof` 
>>>> type of a variable or expression changes, correct? (I think 
>>>> that would be very weird.)
>>>
>>> No, I meant type state.
>>>
>>> https://en.wikipedia.org/wiki/Typestate_analysis
>>>
>>> unreachable < reachable < initialized < default-initialized < 
>>> non-null < user
>> 
>> I didn’t read the Wikipedia article in detail, but it contains 
>> no “null,” so I’m wondering how it’s related. A variable of 
>> non-nullable type must be initialized. If we’re talking 
>> `@system` code, fine, it need not be, it could even be void 
>> initialized. IIUC, typestate analysis could be used to make 
>> void initialization `@safe` by proving that a void initialized 
>> value has definitely been initialized whenever it’s read (i.e. 
>> no uninitialized read).
>> 
>> IIUC, what you’re suggesting is allowing variables of non-null 
>> type to be initialized by `null`, but that reading one 
>> requires them to be initialized.
>
> No.
>
> Initialized, just means it has been initialized. The value, has 
> no guarantees beyond this.
>
> It may be read, it may be mutated.
>
> A non-null type state, means that it has been initialized AND 
> its value isn't the sentinel value null.
>
> If it is non-null it may be dereferenced. An initialized 
> pointer may not be dereferenced as it is lower than non-null.
>
>>>>> However, you do not need to annotate function body 
>>>>> variables with this approach.
>>>>>
>>>>> Look at the initializer of a function variable declaration, 
>>>>> it'll tell you if it has the non-null type state.
>>>>>
>>>>> ```d
>>>>> int* ptr1;
>>>>> int* ptr2 = ptr1;
>>>>> ```
>>>>
>>>> The only issue is, just because e.g. a pointer is 
>>>> initialized with something non-null (e.g. the address of a 
>>>> variable), that doesn’t mean some logic later won’t assign 
>>>> `null` to it.
>>>
>>> Right, that would have to be disallowed without DFA, since 
>>> the type state must not change throughout a function body.
>> 
>> Why wouldn’t it be able to?
>
> You need the DFA to be able to prove the guarantees in the type 
> system hold.
>
> Remove the ability for the type state to change, and you don't 
> need the DFA.
>
>>>>> However the problem which caused me some problems in the 
>>>>> past is on tracking variables outside of a function. You 
>>>>> cannot do it.
>>>>>
>>>>> Variables outside a function change type state during their 
>>>>> lifespan. They have the full life cycle, starting at 
>>>>> reachable, into non-null and then back to reachable. If you 
>>>>> tried to force it to be non-null, the language would force 
>>>>> you to have an .init value that is non-null. This is an 
>>>>> known issue with classes already. It WILL produce logic 
>>>>> errors that are undetectable.
>>>>
>>>> I don’t care much about tracking. Probably, with `if (auto) 
>>>> ...`, you can just rename the variable, but typed 
>>>> non-nullable:
>>>>
>>>> ```d
>>>> void f(int*? p)
>>>> {
>>>>      if (int* q = p) ... else return;
>>>>      int v = *q; // no error, q isn’t nullable, not by 
>>>> analysis, just by type
>>>> }
>>>> ```
>>>
>>> What matters here is that you do not need to add annotation 
>>> to the type itself. It only needs to exist within the 
>>> function signature. Anywhere else its useless information.
>> 
>> I don’t understand. To me, `Object!` and `Object?` are related 
>> but different types. You can have arrays of them, etc., how 
>> else would the information of nullableness be retained?
>> 
>> Maybe I need some info dump on type state analysis and what 
>> you mean exactly, because as I understand, TSA would only give 
>> you an implicit cast from `T?` to `T!` in some cases, similar 
>> to how uniqueness gives you an implicit cast from `T` to 
>> `immutable(T)` in some cases.
>
> No, it goes in both direction.
>
> Type state analysis is based upon a scale, that has a transfer 
> function to go up and down it.
>
> You start with unreachable, meaning you cannot read or mutate 
> it. Any access is an error.
>
> Next is reachable, you can write to it, but cannot read it. 
> This is void initialized (uninitialized). When a variable 
> declaration is seen this is the default prior to handling the 
> initializer expression.
>
> Initialized can be both read and mutated. It is the default in 
> D. For pointers this is the sentinel value null. Aka its 
> nullable. It must not be dereferenced.
>
> Non-null is a pointer proven to not be the sentinel value null. 
> It may be dereferenced as well as read/mutated.
>
> As you increase in the scale, you get more guarantees, and 
> therefore safety to perform otherwise potentially wrong logic.
>
> In such analysis, with a DFA you do it as part of the 
> variables, not the types.
>
> ```d
> int* var;
> // type state initialized
>
> if (var !is null) {
> 	// type state non-null
> } // type state min(initialized, non-null)
>
> var = new int;
> // type state non-null
> ```

So, yes, basically it if TSA can prove a (nullable) pointer 
definitely isn’t null at some point, at this point, it may be 
treated like (including converted to) a non-nullable pointer 
(e.g. copied to one, be dereferenced, etc.).

I see two concerns:
- The guarantees might be really weak, i.e. TSA might not be able 
to prove much in practice when it comes to non-null.
- It might be hard to explain why a variable is possibly null at 
some point. If we don’t even have TSA and the error is “`x` is of 
nullable type” that’s understandable. I have to copy `x` to a 
variable that’s of non-null type using a language construct that 
incurs an assertion or check. On the other hand, with TSA, the 
compiler must assume the programmer expected TSA to prove 
something non-null, but it couldn’t, and explaining why might be 
not very insightful and thus not very actionable.

Illustrating the first concern:
```d
int** global;

void remember(ref int* p) @system { global = &p; }

void setNull() @system { *global = null; }

void main() @system
{
     int* p = new int;
     // TSA: p is not null here
     remember(p);
     // TSA: p is not null here(?)
     setNull();
     // TSA: ???
}
```

How would TSA “know” that `p` changed after `setNull`? D allows 
for a lot of action at a distance (mostly because D has pointers).

My suspicion is that, unfortunately, because TSA has to make 
conservative assumptions, it’ll have to give us rather weak 
guarantees after innocuous things happen, like a function call.

I have some experience with C#’s non-nullable types. If you hover 
over a variable of reference type, it’ll tell you if the variable 
can be null (initially surprisingly, even if the variable is 
typed non-null, but that’s because C#’s non-null annotations are 
more of a suggestion than a guarantee). I don’t know how many 
people code D with an editor that has some equivalent of 
IntelliSense. I don’t.

Drawing from C#, it also does null analysis for properties. You 
speak of variables, but what about properties?

The likely explanation for why that is, is that the non-null 
state is fragile. An initialized variable won’t ever become 
uninitialized, not because that’s logically impossible, but the 
language has no operation that would do that.

A similar issue is with structs’ `init`. I hate it. C++ has it 
right here using default constructors. A struct with invariants 
may have its invariants violated by `init`. It must have a 
constructor ran over it to be valid. Here, I’d assume TSA could 
do some work, but again, action at a distance. Only if we 
disallow resetting a struct with invariants to `init` do we get 
way. But then, what about moves?