Null-checked reference types

Mon Aug 12 12:02:52 UTC 2024

On 12/08/2024 10:02 PM, Quirin Schroll wrote:
>>>> > No data flow analysis is proposed. Null checking is local > and
>>>> done by tracking ? and ! by the type system.
>>>>
>>>> DFA is only required if you want the type state to change as the 
>>>> function is interpreted. So that's fine. That is a me thing to 
>>>> figure out.
>>>
>>> If I understand correctly, by “type state” you means something like 
>>> value range propagation. It basically *is* value range propagation, 
>>> however the ranges in question are `null` and all non-null values. 
>>> You don’t suggest `typeof` type of a variable or expression changes, 
>>> correct? (I think that would be very weird.)
>>
>> No, I meant type state.
>>
>> https://en.wikipedia.org/wiki/Typestate_analysis
>>
>> unreachable < reachable < initialized < default-initialized < non-null 
>> < user
> 
> I didn’t read the Wikipedia article in detail, but it contains no 
> “null,” so I’m wondering how it’s related. A variable of non-nullable 
> type must be initialized. If we’re talking `@system` code, fine, it need 
> not be, it could even be void initialized. IIUC, typestate analysis 
> could be used to make void initialization `@safe` by proving that a void 
> initialized value has definitely been initialized whenever it’s read 
> (i.e. no uninitialized read).
> 
> IIUC, what you’re suggesting is allowing variables of non-null type to 
> be initialized by `null`, but that reading one requires them to be 
> initialized.

No.

Initialized, just means it has been initialized. The value, has no 
guarantees beyond this.

It may be read, it may be mutated.

A non-null type state, means that it has been initialized AND its value 
isn't the sentinel value null.

If it is non-null it may be dereferenced. An initialized pointer may not 
be dereferenced as it is lower than non-null.

>>>> However, you do not need to annotate function body variables with 
>>>> this approach.
>>>>
>>>> Look at the initializer of a function variable declaration, it'll 
>>>> tell you if it has the non-null type state.
>>>>
>>>> ```d
>>>> int* ptr1;
>>>> int* ptr2 = ptr1;
>>>> ```
>>>
>>> The only issue is, just because e.g. a pointer is initialized with 
>>> something non-null (e.g. the address of a variable), that doesn’t 
>>> mean some logic later won’t assign `null` to it.
>>
>> Right, that would have to be disallowed without DFA, since the type 
>> state must not change throughout a function body.
> 
> Why wouldn’t it be able to?

You need the DFA to be able to prove the guarantees in the type system hold.

Remove the ability for the type state to change, and you don't need the DFA.

>>>> However the problem which caused me some problems in the past is on 
>>>> tracking variables outside of a function. You cannot do it.
>>>>
>>>> Variables outside a function change type state during their 
>>>> lifespan. They have the full life cycle, starting at reachable, into 
>>>> non-null and then back to reachable. If you tried to force it to be 
>>>> non-null, the language would force you to have an .init value that 
>>>> is non-null. This is an known issue with classes already. It WILL 
>>>> produce logic errors that are undetectable.
>>>
>>> I don’t care much about tracking. Probably, with `if (auto) ...`, you 
>>> can just rename the variable, but typed non-nullable:
>>>
>>> ```d
>>> void f(int*? p)
>>> {
>>>      if (int* q = p) ... else return;
>>>      int v = *q; // no error, q isn’t nullable, not by analysis, just 
>>> by type
>>> }
>>> ```
>>
>> What matters here is that you do not need to add annotation to the 
>> type itself. It only needs to exist within the function signature. 
>> Anywhere else its useless information.
> 
> I don’t understand. To me, `Object!` and `Object?` are related but 
> different types. You can have arrays of them, etc., how else would the 
> information of nullableness be retained?
> 
> Maybe I need some info dump on type state analysis and what you mean 
> exactly, because as I understand, TSA would only give you an implicit 
> cast from `T?` to `T!` in some cases, similar to how uniqueness gives 
> you an implicit cast from `T` to `immutable(T)` in some cases.

No, it goes in both direction.

Type state analysis is based upon a scale, that has a transfer function 
to go up and down it.

You start with unreachable, meaning you cannot read or mutate it. Any 
access is an error.

Next is reachable, you can write to it, but cannot read it. This is void 
initialized (uninitialized). When a variable declaration is seen this is 
the default prior to handling the initializer expression.

Initialized can be both read and mutated. It is the default in D. For 
pointers this is the sentinel value null. Aka its nullable. It must not 
be dereferenced.

Non-null is a pointer proven to not be the sentinel value null. It may 
be dereferenced as well as read/mutated.

As you increase in the scale, you get more guarantees, and therefore 
safety to perform otherwise potentially wrong logic.

In such analysis, with a DFA you do it as part of the variables, not the 
types.

```d
int* var;
// type state initialized

if (var !is null) {
	// type state non-null
} // type state min(initialized, non-null)

var = new int;
// type state non-null
```