Opt-in non-null class references?

Wed Feb 28 13:43:37 UTC 2018

Hi,

Andrei said in 2014 that not-null-references should be the 
priority of 2014's language design, with consideration to make 
not-null the default. In case the code breakage is too high, this 
can be an opt-in compiler flag.

Discussion here: 
https://forum.dlang.org/post/lcq2il$2emp$1@digitalmars.com

Everybody in the 2014 thread was hyped, but has anything ever 
happened in the language? In November 2017, the D forum discussed 
C#'s non-null warnings. Has anybody thought about this again 
since?

In D, to prevent immense breakage, non-nullable class references 
need to be opt-in. I would love to see them and don't mind 
adapting my 25,000-line D-using project during a weekend.

Are there any counter-arguments to why non-nullable 
references/pointers haven't made it into D yet? Feel free to 
attack my answers below.

* * *

Argument: If A denotes non-null reference to class A, it can't 
have an init value.
Answer: Both A?.init and A.init shall be null, then use code-flow 
analysis.

This would match D's immutable: In a class constructor, you may 
assign the value 5 to a field of type immutable(int) that has 
init value 0. The compiler is happy as long as it can prove that 
we never write a second time during this constructor, and that we 
never read before the first assignment.

Likewise, it should be legal to assign from A to another A 
expression such as new A(), and the compiler is happy as long as 
the reference is assigned eventually, and if the reference is 
never read before assignment. (I haven't contributed to the 
compiler, I can't testify it's that easy.)

To allow hacks, it should remain legal to cast A? (nullable 
reference) to A (non-nullable). This should pass compilation 
(because casting takes all responsibility from the compiler) and 
then segfault at runtime, like any null dereference today.

* * *

Argument: I shall express non-null with contracts.
Answer: That's indeed the best solution without any language 
change. But it's bloaty and doesn't check anything at 
compile-time.

     class A { }
     void f1(A a) in { assert(a); } do { f2(a); }
     void f2(A a) in { assert(a); } do { f3(a); }
     void f3(A a) in { assert(a); } do { ...; }
     void g(A a) { if (a) ...; else ...; }

Sturdy D code must look like this today. Some functions handle 
the nulls, others request non-null refs from their callers. The 
function signature should express this, and a contract is part of 
the signature.

But several maintenance problems arise from non-null via contract.

First issue: We now rely on unit-testing to ensure our types are 
correct. You would do that in dynamic languages where the type 
system can't give you meaningful diagonstic errors otherwise. I'd 
rather not fall back to this in D. It's easy to forget such 
tests, coverage analysis doesn't help here.

Second issue: Introducing new fields requires updating all 
methods that uses the fields. This isn't necessarily only the 
methods in the class. If you have this code:

     class B {
         A a1;
         void f1() in { assert(a1); } do { ... }
         void f2() in { assert(a1); } do { ... }
     }

When you introduce more fields, you must update every method. 
This is bug-prone; we have final-switch (a full-blown language 
feature) just to solve similar issues:

     class B {
         A a1;
         A a2;
         void f1() in { assert(a1); assert(a2); } do { ... }
         void f2() in { assert(a1); /+ forgot +/ } do { ... }
     }

Third issue: Most references in a program aren't null. Especially 
class references that are fields of another class are often 
initialized in the constructor once, and never re-set. This is 
the predominant use of references. In D, the default, implicit 
case should do the Right Thing; it's fine when nonstandard 
features (allowing null) are explicit.

Assuming that A means non-null A, I would love this instead:

     class A { }
     void f1(A a) { f2(a); }
     void f2(A a) { f3(a); }
     void f3(A a) { ...; }
     void g(A? a) { if (a) ...; else ...; }
Or:
     void g(A @nullable a) { if (a) ...; else ...; }

Code-flow analysis can already statically check that we 
initialize immutable values only once. Likewise, it should check 
that we only pass A? to f1 after we have tested it for non-null, 
and that we only call methods on A? after checking for its 
non-null-ness (and the type of `a' inside the `if' block should 
probably still be A?, not A.)

* * *

Argument: null refs aren't a problem, they're memory-safe.
Answer: Memory-safety is not the concern here. Readability of 
code is, and preventing at compiletime what safely explodes at 
runtime.

* * *

Argument: Roll your own non-null type as a wrapper around D's 
nullable class reference.
Answer: That will look ugly, is an abstraction inversion, and 
checks at runtime only.

     class A { }

     struct NotNull(T)
         if (is(T == class))
     {
         T payload;
         @disable this();
         this(T t) {
             assert(t !is null);
             payload = t;
         }
         alias payload this;
     }

     NotNull!A a = NotNull!A(new A());

The non-nullable type is type with simpler behavior, I can call 
all methods without segfault. The nullable type is the more 
complex type, I can either call methods on it or must check first 
for non-nullness. My NotNull implements a simple type in terms of 
a more complex type. Such abstraction inversion is dubious design.

And this solution would only assert at runtime again, not at 
compile time.

Microsoft's C++ Guideline Support Library has not_null<T>. That 
attacks the right problem, but becomes boilerplate when it 
appears everywhere in your codebase.

* * *

Argument: If A is going to denote non-null-A, then this will 
break huge amounts of code.
Answer: Like @safe, any such massive break must be opt-in.

The biggest downside of opt-in is that few projects will use it, 
and the feature will be buggy for a long time.

For example, associative arrays in opt-in @safe code together 
with overriding opEquals with @safe-nothrow-... annotations, all 
this can subtly fail if you mix it in complicated ways. 
Sometimes, you resort to ripping out the good annotations in your 
projects to please the compiler instead of dustmiting your 
project.

* * *

Argument: It's not worth it.

I firmly believe it's worth it, but I accept that others deem 
other things more important.

I merely happen to love OOP and use D classes almost everywhere, 
thus I have references everywhere, and methods everywhere that 
accept references as parameters.

-- Simon

I'll be happy to discuss this in person at DConf 2018. :-)