Null-checked reference types

Tue Aug 13 13:33:13 UTC 2024

On 13/08/2024 10:33 PM, Quirin Schroll wrote:
> On Monday, 12 August 2024 at 12:02:52 UTC, Richard (Rikki) Andrew 
> Cattermole wrote:
>> On 12/08/2024 10:02 PM, Quirin Schroll wrote:
>>>>>> > No data flow analysis is proposed. Null checking is local
>>>>>> > > and
>>>>>> done by tracking ? and ! by the type system.
>>>>>>
>>>>>> DFA is only required if you want the type state to change as the 
>>>>>> function is interpreted. So that's fine. That is a me thing to 
>>>>>> figure out.
>>>>>
>>>>> If I understand correctly, by “type state” you means something like 
>>>>> value range propagation. It basically *is* value range propagation, 
>>>>> however the ranges in question are `null` and all non-null values. 
>>>>> You don’t suggest `typeof` type of a variable or expression 
>>>>> changes, correct? (I think that would be very weird.)
>>>>
>>>> No, I meant type state.
>>>>
>>>> https://en.wikipedia.org/wiki/Typestate_analysis
>>>>
>>>> unreachable < reachable < initialized < default-initialized < 
>>>> non-null < user
>>>
>>> I didn’t read the Wikipedia article in detail, but it contains no 
>>> “null,” so I’m wondering how it’s related. A variable of non-nullable 
>>> type must be initialized. If we’re talking `@system` code, fine, it 
>>> need not be, it could even be void initialized. IIUC, typestate 
>>> analysis could be used to make void initialization `@safe` by proving 
>>> that a void initialized value has definitely been initialized 
>>> whenever it’s read (i.e. no uninitialized read).
>>>
>>> IIUC, what you’re suggesting is allowing variables of non-null type 
>>> to be initialized by `null`, but that reading one requires them to be 
>>> initialized.
>>
>> No.
>>
>> Initialized, just means it has been initialized. The value, has no 
>> guarantees beyond this.
>>
>> It may be read, it may be mutated.
>>
>> A non-null type state, means that it has been initialized AND its 
>> value isn't the sentinel value null.
>>
>> If it is non-null it may be dereferenced. An initialized pointer may 
>> not be dereferenced as it is lower than non-null.
>>
>>>>>> However, you do not need to annotate function body variables with 
>>>>>> this approach.
>>>>>>
>>>>>> Look at the initializer of a function variable declaration, it'll 
>>>>>> tell you if it has the non-null type state.
>>>>>>
>>>>>> ```d
>>>>>> int* ptr1;
>>>>>> int* ptr2 = ptr1;
>>>>>> ```
>>>>>
>>>>> The only issue is, just because e.g. a pointer is initialized with 
>>>>> something non-null (e.g. the address of a variable), that doesn’t 
>>>>> mean some logic later won’t assign `null` to it.
>>>>
>>>> Right, that would have to be disallowed without DFA, since the type 
>>>> state must not change throughout a function body.
>>>
>>> Why wouldn’t it be able to?
>>
>> You need the DFA to be able to prove the guarantees in the type system 
>> hold.
>>
>> Remove the ability for the type state to change, and you don't need 
>> the DFA.
>>
>>>>>> However the problem which caused me some problems in the past is 
>>>>>> on tracking variables outside of a function. You cannot do it.
>>>>>>
>>>>>> Variables outside a function change type state during their 
>>>>>> lifespan. They have the full life cycle, starting at reachable, 
>>>>>> into non-null and then back to reachable. If you tried to force it 
>>>>>> to be non-null, the language would force you to have an .init 
>>>>>> value that is non-null. This is an known issue with classes 
>>>>>> already. It WILL produce logic errors that are undetectable.
>>>>>
>>>>> I don’t care much about tracking. Probably, with `if (auto) ...`, 
>>>>> you can just rename the variable, but typed non-nullable:
>>>>>
>>>>> ```d
>>>>> void f(int*? p)
>>>>> {
>>>>>      if (int* q = p) ... else return;
>>>>>      int v = *q; // no error, q isn’t nullable, not by analysis, 
>>>>> just by type
>>>>> }
>>>>> ```
>>>>
>>>> What matters here is that you do not need to add annotation to the 
>>>> type itself. It only needs to exist within the function signature. 
>>>> Anywhere else its useless information.
>>>
>>> I don’t understand. To me, `Object!` and `Object?` are related but 
>>> different types. You can have arrays of them, etc., how else would 
>>> the information of nullableness be retained?
>>>
>>> Maybe I need some info dump on type state analysis and what you mean 
>>> exactly, because as I understand, TSA would only give you an implicit 
>>> cast from `T?` to `T!` in some cases, similar to how uniqueness gives 
>>> you an implicit cast from `T` to `immutable(T)` in some cases.
>>
>> No, it goes in both direction.
>>
>> Type state analysis is based upon a scale, that has a transfer 
>> function to go up and down it.
>>
>> You start with unreachable, meaning you cannot read or mutate it. Any 
>> access is an error.
>>
>> Next is reachable, you can write to it, but cannot read it. This is 
>> void initialized (uninitialized). When a variable declaration is seen 
>> this is the default prior to handling the initializer expression.
>>
>> Initialized can be both read and mutated. It is the default in D. For 
>> pointers this is the sentinel value null. Aka its nullable. It must 
>> not be dereferenced.
>>
>> Non-null is a pointer proven to not be the sentinel value null. It may 
>> be dereferenced as well as read/mutated.
>>
>> As you increase in the scale, you get more guarantees, and therefore 
>> safety to perform otherwise potentially wrong logic.
>>
>> In such analysis, with a DFA you do it as part of the variables, not 
>> the types.
>>
>> ```d
>> int* var;
>> // type state initialized
>>
>> if (var !is null) {
>>     // type state non-null
>> } // type state min(initialized, non-null)
>>
>> var = new int;
>> // type state non-null
>> ```
> 
> So, yes, basically it if TSA can prove a (nullable) pointer definitely 
> isn’t null at some point, at this point, it may be treated like 
> (including converted to) a non-nullable pointer (e.g. copied to one, be 
> dereferenced, etc.).
> 
> I see two concerns:
> - The guarantees might be really weak, i.e. TSA might not be able to 
> prove much in practice when it comes to non-null.
> - It might be hard to explain why a variable is possibly null at some 
> point. If we don’t even have TSA and the error is “`x` is of nullable 
> type” that’s understandable. I have to copy `x` to a variable that’s of 
> non-null type using a language construct that incurs an assertion or 
> check. On the other hand, with TSA, the compiler must assume the 
> programmer expected TSA to prove something non-null, but it couldn’t, 
> and explaining why might be not very insightful and thus not very 
> actionable.

You only need to store converge points to improve the error message 
significantly. Anything with multiple scopes. Such as switch statement, 
loops ext. Do that 2 or three times and you should be able to produce a 
pretty nice error message. However, I won't be implementing that. 
Somebody else can do it, different skill set that I don't have currently.

> Illustrating the first concern:
> ```d
> int** global;
> 
> void remember(ref int* p) @system { global = &p; }
> 
> void setNull() @system { *global = null; }
> 
> void main() @system
> {
>      int* p = new int;
>      // TSA: p is not null here
>      remember(p);
>      // TSA: p is not null here(?)
>      setNull();
>      // TSA: ???
> }
> ```

Right, to do this, you had to drop out of ``@safe``. Making ``@trusted`` 
and ``@system`` safe, is not a design goal of D.

With ``@safe`` escape analysis will mark the by-ref parameter as 
``scope`` and won't let you escape it, preventing this situation.

> How would TSA “know” that `p` changed after `setNull`? D allows for a 
> lot of action at a distance (mostly because D has pointers).
> 
> My suspicion is that, unfortunately, because TSA has to make 
> conservative assumptions, it’ll have to give us rather weak guarantees 
> after innocuous things happen, like a function call.

No, only when ``@system`` functions are called. It'll be enforced in 
``@safe`` and ``@trusted`` should hopefully detect it for when it is called.

> I have some experience with C#’s non-nullable types. If you hover over a 
> variable of reference type, it’ll tell you if the variable can be null 
> (initially surprisingly, even if the variable is typed non-null, but 
> that’s because C#’s non-null annotations are more of a suggestion than a 
> guarantee). I don’t know how many people code D with an editor that has 
> some equivalent of IntelliSense. I don’t.
> 
> Drawing from C#, it also does null analysis for properties. You speak of 
> variables, but what about properties?

Fields, and globals are not supported due to temporal safety. You must 
perform the load into a function variable before access/mutation.

Even if it was supported, another thread can mutate it from under you 
after a check (or a known analysis state) and at CT you wouldn't know 
about it.

> The likely explanation for why that is, is that the non-null state is 
> fragile. An initialized variable won’t ever become uninitialized, not 
> because that’s logically impossible, but the language has no operation 
> that would do that.

In D it would have the ability to, move constructors.

Same with unreachable, loops & goto reset reachability of variables.

> A similar issue is with structs’ `init`. I hate it. C++ has it right 
> here using default constructors. A struct with invariants may have its 
> invariants violated by `init`. It must have a constructor ran over it to 
> be valid. Here, I’d assume TSA could do some work, but again, action at 
> a distance. Only if we disallow resetting a struct with invariants to 
> `init` do we get way. But then, what about moves?

I'm not touching invariants. I view them as good as is.