Disallow null references in safe code?

Sat Feb 1 17:17:03 PST 2014

On Sunday, 2 February 2014 at 01:01:25 UTC, Andrei Alexandrescu 
wrote:
> On 2/1/14, 1:40 PM, deadalnix wrote:
>> On Saturday, 1 February 2014 at 20:09:13 UTC, Andrei 
>> Alexandrescu wrote:
>>> This has been discussed to death a number of times. A field 
>>> access
>>> obj.field will use addressing with a constant offset. If that 
>>> offset
>>> is larger than the lowest address allowed to the application, 
>>> unsafety
>>> may occur.
>>>
>>
>> That is one point. The other point is that the optimizer can 
>> remove a
>> null check, and then a load, causing undefined behavior.
>
> I don't understand this. Program crash is defined behavior.
>
> Andrei

This has also been discussed. Let's consider the buggy code 
bellow:

void foo(int* ptr) {
   *ptr;
   if (ptr !is null) {
     // do stuff
   }

   // do other stuff
}

Note that the code presented above look quite stupid, but this is 
typically what you end up with if you call 2 function, one that 
does a null check and one that doesn't after inlining.

You would expect that the program segfault at the first line. But 
it is in fact undefined behavior. The optimizer can decide to 
remove the null check as ptr is dereferenced before so can't be 
null, and a later pass can remove the first deference as it is a 
dead load. Both GCC and LLVM optimizer can exhibit such behavior.

Dereferencing null is not guaranteed to segfault, unless we 
impose restriction on the optimizer such as do not optimize a 
load away unless you can prove it won't trap, which is almost 
impossible to know for the compiler. As a result, you won't be 
able to optimize most loads away.

Unless we are willing to impose such restriction on the optimizer 
(understand recode several passes of existing optimizer or do not 
rely on them, which is a huge manpower cost, and accept poorer 
performences) dereferencing null is undefined behavior and can't 
be guaranteed to crash.