Why does nobody seem to think that `null` is a serious problem in D?

Steven Schveighoffer schveiguy at gmail.com
Tue Nov 20 19:11:46 UTC 2018


On 11/20/18 1:04 PM, Johan Engelen wrote:
> On Tuesday, 20 November 2018 at 03:38:14 UTC, Jonathan M Davis wrote:
>>
>> For @safe to function properly, dereferencing null _must_ be 
>> guaranteed to be memory safe, and for dmd it is, since it will always 
>> segfault. Unfortunately, as understand it, it is currently possible 
>> with ldc's optimizer to run into trouble, since it'll do things like 
>> see that something must be null and therefore assume that it must 
>> never be dereferenced, since it would clearly be wrong to dereference 
>> it. And then when the code hits a point where it _does_ try to 
>> dereference it, you get undefined behavior. It's something that needs 
>> to be fixed in ldc, but based on discussions I had with Johan at dconf 
>> this year about the issue, I suspect that the spec is going to have to 
>> be updated to be very clear on how dereferencing null has to be 
>> handled before the ldc guys do anything about it. As long as the 
>> optimizer doesn't get involved everything is fine, but as great as 
>> optimizers can be at making code faster, they aren't really written 
>> with stuff like @safe in mind.
> 
> One big problem is the way people talk and write about this issue. There 
> is a difference between "dereferencing" in the language, and reading 
> from a memory address by the CPU.

In general, I always consider "dereferencing" the point at which code 
follows a pointer to read or write its data. The semantics of modifying 
the type to mean the data vs. the pointer to it, seems less interesting. 
Types are compiler internal things, the actual reads and writes are what 
cause the problems.

But really, it's the act of using a pointer to read/write the data it 
points at which causes the segfault. And in D, we assume that this 
action is @safe because of the MMU protecting the first page.

> Confusing language semantics with what the CPU is doing happens often in 
> the D community and is not helping these debates.
> 
> D is proclaiming that dereferencing `null` must segfault but that is not 
> implemented by any of the compilers. It would require inserting null 
> checks upon every dereference. (This may not be as slow as you may 
> think, but it would probably not make code run faster.)
> 
> An example:
> ```
> class A {
>      int i;
>      final void foo() {
>           import std.stdio; writeln(__LINE__);
>          // i = 5;
>      }
> }
> 
> void main() {
>      A a;
>      a.foo();
> }
> ```
> 
> In this case, the actual null dereference happens on the last line of 
> main. The program runs fine however since dlang 2.077.

Right, the point is that the segfault happens when null pointers are 
used to get at the data. If you turn something that is ultimately a 
pointer into another type of pointer, then you aren't dereferencing it 
really. This happens when you pass *pointer into a function that takes a 
reference (or when you pass around a class reference).

In any case, the prior versions to 2.077 didn't segfault, they just had 
a prelude in front of every function which asserted that this wasn't 
null (you actually get a nice stack trace).

> Now when `foo` is modified such that it writes to member field `i`, the 
> program does segfault (writes to address 0).
> D does not make dereferencing on class objects explicit, which makes it 
> harder to see where the dereference is happening.

Again, the terms are confusing. You just said the dereference happens at 
a.foo(), right? I would consider the dereference to happen when the 
object's data is used. i.e. when you read or write what the pointer 
points at.

> 
> So, I think all compiler implementations are not spec compliant on this 
> point.

I think if the spec says that dereferencing doesn't mean following a 
pointer to it's data, and reading/writing that data, and it says null 
dereferences cause a segfault, then the spec needs to be updated. The 
@safe segfault is what it should be focused on, not some abstract 
concept that exists only in the compiler.

If it means changing the terminology, then we should do that.

> I think most people believe that compliance is too costly for the kind 
> of software one wants to write in D; the issue is similar to array 
> bounds checking that people explicitly disable or work around.
> For compliance we would need to change the compiler to emit null checks 
> on all @safe dereferences (the opposite direction was chosen in 2.077). 
> It'd be interesting to do the experiment.

The whole point of using the MMU instead of instrumentation is because 
we can avoid the performance penalties and still be safe. The only 
loophole is large structures that may extend beyond the protected data. 
I would suggest that the compiler inject extra reads of the front of any 
data type in that case (when @safe is enabled) to cause a segfault properly.

-Steve


More information about the Digitalmars-d-learn mailing list