Null-checked reference types
Quirin Schroll
qs.il.paperinik at gmail.com
Tue Aug 13 10:33:01 UTC 2024
On Monday, 12 August 2024 at 12:02:52 UTC, Richard (Rikki) Andrew
Cattermole wrote:
> On 12/08/2024 10:02 PM, Quirin Schroll wrote:
>>>>> > No data flow analysis is proposed. Null checking is local
>>>>> > > and
>>>>> done by tracking ? and ! by the type system.
>>>>>
>>>>> DFA is only required if you want the type state to change
>>>>> as the function is interpreted. So that's fine. That is a
>>>>> me thing to figure out.
>>>>
>>>> If I understand correctly, by “type state” you means
>>>> something like value range propagation. It basically *is*
>>>> value range propagation, however the ranges in question are
>>>> `null` and all non-null values. You don’t suggest `typeof`
>>>> type of a variable or expression changes, correct? (I think
>>>> that would be very weird.)
>>>
>>> No, I meant type state.
>>>
>>> https://en.wikipedia.org/wiki/Typestate_analysis
>>>
>>> unreachable < reachable < initialized < default-initialized <
>>> non-null < user
>>
>> I didn’t read the Wikipedia article in detail, but it contains
>> no “null,” so I’m wondering how it’s related. A variable of
>> non-nullable type must be initialized. If we’re talking
>> `@system` code, fine, it need not be, it could even be void
>> initialized. IIUC, typestate analysis could be used to make
>> void initialization `@safe` by proving that a void initialized
>> value has definitely been initialized whenever it’s read (i.e.
>> no uninitialized read).
>>
>> IIUC, what you’re suggesting is allowing variables of non-null
>> type to be initialized by `null`, but that reading one
>> requires them to be initialized.
>
> No.
>
> Initialized, just means it has been initialized. The value, has
> no guarantees beyond this.
>
> It may be read, it may be mutated.
>
> A non-null type state, means that it has been initialized AND
> its value isn't the sentinel value null.
>
> If it is non-null it may be dereferenced. An initialized
> pointer may not be dereferenced as it is lower than non-null.
>
>>>>> However, you do not need to annotate function body
>>>>> variables with this approach.
>>>>>
>>>>> Look at the initializer of a function variable declaration,
>>>>> it'll tell you if it has the non-null type state.
>>>>>
>>>>> ```d
>>>>> int* ptr1;
>>>>> int* ptr2 = ptr1;
>>>>> ```
>>>>
>>>> The only issue is, just because e.g. a pointer is
>>>> initialized with something non-null (e.g. the address of a
>>>> variable), that doesn’t mean some logic later won’t assign
>>>> `null` to it.
>>>
>>> Right, that would have to be disallowed without DFA, since
>>> the type state must not change throughout a function body.
>>
>> Why wouldn’t it be able to?
>
> You need the DFA to be able to prove the guarantees in the type
> system hold.
>
> Remove the ability for the type state to change, and you don't
> need the DFA.
>
>>>>> However the problem which caused me some problems in the
>>>>> past is on tracking variables outside of a function. You
>>>>> cannot do it.
>>>>>
>>>>> Variables outside a function change type state during their
>>>>> lifespan. They have the full life cycle, starting at
>>>>> reachable, into non-null and then back to reachable. If you
>>>>> tried to force it to be non-null, the language would force
>>>>> you to have an .init value that is non-null. This is an
>>>>> known issue with classes already. It WILL produce logic
>>>>> errors that are undetectable.
>>>>
>>>> I don’t care much about tracking. Probably, with `if (auto)
>>>> ...`, you can just rename the variable, but typed
>>>> non-nullable:
>>>>
>>>> ```d
>>>> void f(int*? p)
>>>> {
>>>> if (int* q = p) ... else return;
>>>> int v = *q; // no error, q isn’t nullable, not by
>>>> analysis, just by type
>>>> }
>>>> ```
>>>
>>> What matters here is that you do not need to add annotation
>>> to the type itself. It only needs to exist within the
>>> function signature. Anywhere else its useless information.
>>
>> I don’t understand. To me, `Object!` and `Object?` are related
>> but different types. You can have arrays of them, etc., how
>> else would the information of nullableness be retained?
>>
>> Maybe I need some info dump on type state analysis and what
>> you mean exactly, because as I understand, TSA would only give
>> you an implicit cast from `T?` to `T!` in some cases, similar
>> to how uniqueness gives you an implicit cast from `T` to
>> `immutable(T)` in some cases.
>
> No, it goes in both direction.
>
> Type state analysis is based upon a scale, that has a transfer
> function to go up and down it.
>
> You start with unreachable, meaning you cannot read or mutate
> it. Any access is an error.
>
> Next is reachable, you can write to it, but cannot read it.
> This is void initialized (uninitialized). When a variable
> declaration is seen this is the default prior to handling the
> initializer expression.
>
> Initialized can be both read and mutated. It is the default in
> D. For pointers this is the sentinel value null. Aka its
> nullable. It must not be dereferenced.
>
> Non-null is a pointer proven to not be the sentinel value null.
> It may be dereferenced as well as read/mutated.
>
> As you increase in the scale, you get more guarantees, and
> therefore safety to perform otherwise potentially wrong logic.
>
> In such analysis, with a DFA you do it as part of the
> variables, not the types.
>
> ```d
> int* var;
> // type state initialized
>
> if (var !is null) {
> // type state non-null
> } // type state min(initialized, non-null)
>
> var = new int;
> // type state non-null
> ```
So, yes, basically it if TSA can prove a (nullable) pointer
definitely isn’t null at some point, at this point, it may be
treated like (including converted to) a non-nullable pointer
(e.g. copied to one, be dereferenced, etc.).
I see two concerns:
- The guarantees might be really weak, i.e. TSA might not be able
to prove much in practice when it comes to non-null.
- It might be hard to explain why a variable is possibly null at
some point. If we don’t even have TSA and the error is “`x` is of
nullable type” that’s understandable. I have to copy `x` to a
variable that’s of non-null type using a language construct that
incurs an assertion or check. On the other hand, with TSA, the
compiler must assume the programmer expected TSA to prove
something non-null, but it couldn’t, and explaining why might be
not very insightful and thus not very actionable.
Illustrating the first concern:
```d
int** global;
void remember(ref int* p) @system { global = &p; }
void setNull() @system { *global = null; }
void main() @system
{
int* p = new int;
// TSA: p is not null here
remember(p);
// TSA: p is not null here(?)
setNull();
// TSA: ???
}
```
How would TSA “know” that `p` changed after `setNull`? D allows
for a lot of action at a distance (mostly because D has pointers).
My suspicion is that, unfortunately, because TSA has to make
conservative assumptions, it’ll have to give us rather weak
guarantees after innocuous things happen, like a function call.
I have some experience with C#’s non-nullable types. If you hover
over a variable of reference type, it’ll tell you if the variable
can be null (initially surprisingly, even if the variable is
typed non-null, but that’s because C#’s non-null annotations are
more of a suggestion than a guarantee). I don’t know how many
people code D with an editor that has some equivalent of
IntelliSense. I don’t.
Drawing from C#, it also does null analysis for properties. You
speak of variables, but what about properties?
The likely explanation for why that is, is that the non-null
state is fragile. An initialized variable won’t ever become
uninitialized, not because that’s logically impossible, but the
language has no operation that would do that.
A similar issue is with structs’ `init`. I hate it. C++ has it
right here using default constructors. A struct with invariants
may have its invariants violated by `init`. It must have a
constructor ran over it to be valid. Here, I’d assume TSA could
do some work, but again, action at a distance. Only if we
disallow resetting a struct with invariants to `init` do we get
way. But then, what about moves?
More information about the dip.development
mailing list