Why does nobody seem to think that `null` is a serious problem in D?
Steven Schveighoffer
schveiguy at gmail.com
Tue Nov 20 19:11:46 UTC 2018
On 11/20/18 1:04 PM, Johan Engelen wrote:
> On Tuesday, 20 November 2018 at 03:38:14 UTC, Jonathan M Davis wrote:
>>
>> For @safe to function properly, dereferencing null _must_ be
>> guaranteed to be memory safe, and for dmd it is, since it will always
>> segfault. Unfortunately, as understand it, it is currently possible
>> with ldc's optimizer to run into trouble, since it'll do things like
>> see that something must be null and therefore assume that it must
>> never be dereferenced, since it would clearly be wrong to dereference
>> it. And then when the code hits a point where it _does_ try to
>> dereference it, you get undefined behavior. It's something that needs
>> to be fixed in ldc, but based on discussions I had with Johan at dconf
>> this year about the issue, I suspect that the spec is going to have to
>> be updated to be very clear on how dereferencing null has to be
>> handled before the ldc guys do anything about it. As long as the
>> optimizer doesn't get involved everything is fine, but as great as
>> optimizers can be at making code faster, they aren't really written
>> with stuff like @safe in mind.
>
> One big problem is the way people talk and write about this issue. There
> is a difference between "dereferencing" in the language, and reading
> from a memory address by the CPU.
In general, I always consider "dereferencing" the point at which code
follows a pointer to read or write its data. The semantics of modifying
the type to mean the data vs. the pointer to it, seems less interesting.
Types are compiler internal things, the actual reads and writes are what
cause the problems.
But really, it's the act of using a pointer to read/write the data it
points at which causes the segfault. And in D, we assume that this
action is @safe because of the MMU protecting the first page.
> Confusing language semantics with what the CPU is doing happens often in
> the D community and is not helping these debates.
>
> D is proclaiming that dereferencing `null` must segfault but that is not
> implemented by any of the compilers. It would require inserting null
> checks upon every dereference. (This may not be as slow as you may
> think, but it would probably not make code run faster.)
>
> An example:
> ```
> class A {
> int i;
> final void foo() {
> import std.stdio; writeln(__LINE__);
> // i = 5;
> }
> }
>
> void main() {
> A a;
> a.foo();
> }
> ```
>
> In this case, the actual null dereference happens on the last line of
> main. The program runs fine however since dlang 2.077.
Right, the point is that the segfault happens when null pointers are
used to get at the data. If you turn something that is ultimately a
pointer into another type of pointer, then you aren't dereferencing it
really. This happens when you pass *pointer into a function that takes a
reference (or when you pass around a class reference).
In any case, the prior versions to 2.077 didn't segfault, they just had
a prelude in front of every function which asserted that this wasn't
null (you actually get a nice stack trace).
> Now when `foo` is modified such that it writes to member field `i`, the
> program does segfault (writes to address 0).
> D does not make dereferencing on class objects explicit, which makes it
> harder to see where the dereference is happening.
Again, the terms are confusing. You just said the dereference happens at
a.foo(), right? I would consider the dereference to happen when the
object's data is used. i.e. when you read or write what the pointer
points at.
>
> So, I think all compiler implementations are not spec compliant on this
> point.
I think if the spec says that dereferencing doesn't mean following a
pointer to it's data, and reading/writing that data, and it says null
dereferences cause a segfault, then the spec needs to be updated. The
@safe segfault is what it should be focused on, not some abstract
concept that exists only in the compiler.
If it means changing the terminology, then we should do that.
> I think most people believe that compliance is too costly for the kind
> of software one wants to write in D; the issue is similar to array
> bounds checking that people explicitly disable or work around.
> For compliance we would need to change the compiler to emit null checks
> on all @safe dereferences (the opposite direction was chosen in 2.077).
> It'd be interesting to do the experiment.
The whole point of using the MMU instead of instrumentation is because
we can avoid the performance penalties and still be safe. The only
loophole is large structures that may extend beyond the protected data.
I would suggest that the compiler inject extra reads of the front of any
data type in that case (when @safe is enabled) to cause a segfault properly.
-Steve
More information about the Digitalmars-d-learn
mailing list