null and type safety

Tue Nov 4 19:54:03 PST 2008

Walter Bright Wrote:

> Brendan Miller wrote:
> > This is obviously a problem. Everyone knows that null pointer
> > exceptions in Java/C#, or segmentation faults in C and C++ are one of
> > the biggest sources of runtime errors.
> 
> Yes, but those are neither type safe errors or memory safe errors. A 
> null pointer is neither mistyped nor can it cause memory corruption.

Well.. I can't speak for null pointers in D, but they can definitely cause memory corruption in C++. Not all OS's have memory protection. *remembers the good old days of Mac OS system 7*

Back to the important point!

A couple of times in this thread I've seen people suggest that null pointers are type safe. I don't see how that statement is justifiable. People accept null because it's always been there for those of us who are long time C coders. What you have to remember, is C was not type safe in any way shape or form.

First off, let's clarify that we're talking about *static* type safety. Languages like python are dynamically type safe because at runtime you will see an exception thrown if you try to perform an operation on a type that it does not support it. If you have a reference in python, you can point it to whatever the hell you want and the runtime will prevent you from performing the wrong operation on the wrong data. It's a more limited form of type checking than static type checking, but many people find this acceptable.

In a statically typed language, it is *impossible* to perform an operation on a type that it does not support because at compile time you know the types of the objects.

Concretely null is a pointer to address zero. For some type T, there is never any T at address zero. Therefor a statically typed language will prevent you from assigning a poitner to an object that is not of type T to a pointer decleared to be type T. That's *the entire point* of static typing. T* means "that which I point to is in the set of T". T sans the star means "I am in the set of T". Not sometimes. Not maybe. Always.

Yes, you can also get performance benefits from type annotations... but that doesn't make the langauge statically type *safe*.

Now of course, sometimes we do want to a pointer to type T to be null... but what does that *mean*? It means, you have a variable that sometimes you want to hold a pointer to T... and sometimes you don't want to hold a pointer to T.

This is called a variant. Different languages implement variants in different ways and have different names for them. In C, they are called unions. C, again, is not type *safe* so if you try to treat a union as the wrong type, it will let you. However, in most langauges, variants provide dynamic typing for variants, and thus offer the lesser form of type safety.

C and C++ pointers to T are variants of type T and the type of NULL. Except, of course, like unions they aren't type safe even dynamically because the runtime won't stop you from derefencing null. The operating system *will* stop you by killing your process, if you are on a system with protected memory because address zero is not accessible to userspace on most systems. *most* systems, not all.

Think about this in terms of set theory and the idea should become clear. Null should not be assignable to a pointer to T because the object it points to at address zero does not lie within the set of T's. If it did lie within the set of T's, then this should be valid:

T myObject;
my Object = *NULL;

It shouldn't even require a type cast because type casts are ways of breaking out of static typing. But it does in C++. In fact, this code generates:

error: invalid type argument of `unary *'

Damn right.

Now, really, what's so hard about adding a statically type safe pointer? C++ already did it, and they are called references. My complaint here, after all, was that D is apparently less type safe than C++.

Now, I have other problems with C++ references. That they have value semantics is just stupid (especially since they are *called* references!). Type safety and value  vs reference semantics have nothing to do with one another. Indeed, sometimes you might even want a variant to have value semantics. That's why C# added nullable value types.

Brendan