Null references redux

Sun Sep 27 08:32:17 PDT 2009

On Sat, 26 Sep 2009 17:08:32 -0400, Walter Bright  
<newshound1 at digitalmars.com> wrote:

> Denis Koroskin wrote:
>  > On Sat, 26 Sep 2009 22:30:58 +0400, Walter Bright
>  > <newshound1 at digitalmars.com> wrote:
>  >> D has borrowed ideas from many different languages. The trick is to
>  >> take the good stuff and avoid their mistakes <g>.
>  >
>  > How about this one:
>  >  
> http://sadekdrobi.com/2008/12/22/null-references-the-billion-dollar-mistake/  
>  >
>  >
>  > :)
>
> I think he's wrong.
>

Analogies aside, we have 2 distinct problems here, with several solutions  
for each.  I jotted down what I think are the solutions being discussed  
and the Pros and Cons of each are.

Problem 1. Developer of a function wants to ensure non-null values are  
passed into his function.

Solution 1:

   Rely on the hardware feature to do the checking for you.

   Pros: Easy to do, simple to implement, optimal performance (hardware's  
going to do this anyways).
   Cons: Runtime error instead of compile-time, Error doesn't always occur  
close to the problem, not always easy to get a stack trace.

Solution 2:

   Check for null once the values come into the function, throw an  
exception.

   Pros: Works with the exception system.
   Cons: Manual implementation required, performance hit for every function  
call, Runtime error instead of compile-time, Error doesn't always occur  
close to the problem.

Solution 3:

   Build the non-null requirement into the function signature (note, the  
requirement is optional, it's still possible to use null references if you  
want).

   Pros: Easy to implement, Compile-time error, hard to "work around" by  
putting a dummy value, sometimes no performance hit, most times very  
little performance hit, allows solution 1 and 2 if you want, runtime  
errors occur AT THE POINT things went wrong not later.
   Cons: Non-zero performance hit (you have to check for null sometimes  
before assignment!)

Solution 4:

   Perform a null check for every dereference (The Java/C# solution).

   Pros: Works with the exception system, easy to implement.
   Cons: Huge performance hit (except in OS where segfault can be hooked),  
Error doesn't always occur close to the problem.

-----------------------

Problem 2. Developer forgets to initialize a declared reference type, but  
uses it.

Solution 1:

   Assign a default value of null.  Rely on hardware to tell you when you  
use it later that you screwed up.

   Pros: Easy to do, simple to implement, optimal performance (hardware's  
going to do this anyways).
   Cons: Runtime error instead of compile-time, Error doesn't always occur  
close to the problem, not always easy to get a stack trace.

Solution 2:

   Require assignment, even if assignment to null. (The "simple" solution)

   Pros: Easy to implement, forces the developer to clarify his  
requirements -- reminding him that there may be a problem.
   Cons: May be unnecessary, forces the developer to make a decision, may  
result in a dummy value being assigned reducing to solution 1.

Solution 3:

   Build into the type the requirement that it can't be null, therefore  
checking for non-null on assignment.  A default value isn't allowed.  A  
nullable type is still allowed, which reduces to solution 1.

   Pros: Easy to implement, solution 1 is still possible, compile-time  
error on misuse, error occurs at the point things went wrong, no  
performance hit (except when you convert a nullable type to a non-nullable  
type), allows solution 3 for first problem.
   Cons: Non-zero performance hit when assigning nullable to non nullable  
type.

Solution 4:

   Compiler performs flow analysis, giving an error when an unassigned  
variable is used. (The C# solution)

   Pros: Compile-time error, with good flow analysis allows correct code  
even when assignment isn't done on declaration.
   Cons: Difficult to implement, sometimes can incorrectly require  
assignment if flow is too complex, can force developer to manually assign  
null or dummy value.

*NOTE* for solution 3 I purposely did NOT include the con that it makes  
people assign a dummy value.  I believe this argument to be invalid, since  
it's much easier to just declare the variable as a nullable equivalent  
type (as other people have pointed out).  That problem is more a factor of  
solutions 2 and 4.

----------------------

Anything I missed?

After looking at all the arguments, and brainstorming myself, I think I  
prefer the non-nullable defaults (I didn't have a position on this concept  
before this thread, and I had given it some thought).

I completely agree with Ary and some others who say "use C# for a while,  
and see how much it helps."  I wrote C# code for a while, and I got those  
errors frequently, usually it was something I forgot to initialize or  
return.  It definitely does not cause the "assign dummy value" syndrome as  
Walter has suggested.  Experience with languages that do a good job of  
letting the programmer know when he made an actual mistake makes a huge  
difference.

I think the non-nullable default will result in even less of a temptation  
to assign a dummy value.

-Steve