Undefined behaviours in D and C

bearophile bearophileHUGS at lycos.com
Mon Apr 19 13:14:13 PDT 2010


Walter Bright:

>D doesn't have this problem because D doesn't have the restrict qualifier.<

So the D2 specs have to explicitly state that all D pointers can be an alias of each other (and this will make D code slower than Fortran77 code).


>If restrict is used incorrectly, however, undefined behavior can result.<

And one of the few ways out of this, while keeping the language safe, is the ownership/lent/etc extensions to the type system, that are cute, but they are not so easy to learn to use and can become a little burden for the D programmer.

Another solution is the restrict keyword as in C. In a D program the restrict keyword can be useful only in few numeric kernels, often less than 30 lines of code, that perform tons of computations in few loops. In such loops the knowledge of distinct pointers can be significantly useful to improve the code. In all other parts of the program such keyword is useless or not essential (such loops can even enjoy a harder form or compilation, almost a supercompilation. The programmer can even give an attribute like @hot to this loop/function. GCC too has a 'hot' function attribute, but I think in GCC it's not very useful).

I don't know what to think about this. Being D a system language, the language is expected to offer unsafe features too, as this one. So maybe offering restrict, to be used in very limited situations, can be acceptable in D too.

In many situations the numerical kernels work over arrays, and D arrays have both a pointer and a length, so it's easy to test if a pointer is inside such interval and if two interval are fully distinct. Such tests can be done in nonrelease mode to give a little more safety to the restrict keyword. Some of such tests can even be kept in release mode if they are outside the heavy loops.

Maybe it can be invented something like restrict but more limited, that works on D arrays only. An extension of the D type system that's useful for numerical kernels that work on arrays. Something like:

@enforce_restrict(array1, array2, ...) {
    // numerical kernel that uses the arrays
}

Inside that enforce the D type system knows they are distinct, it's like a restrict applied to their pointers. I don't know if this can work in practical situations. Maybe there's an acceptable solution to this problem of D2.

---------------

I think in C you can't reliably cast a pointer from a type to a different type. I think because the C compiler (and D compiler, I presume) can optimize away some things, making this unsafe/undefined.

This conversion is sometimes done using an union, that's a bit safer than the reinterpret cast:

union Foo2Bar {
   int* iptr;
   double* dptr;
}

But I think the C standard says that from a union you can't read a field different from the last field you have written, so that too is unsafe:

import std.stdio;
union U { int i; float f; }
void main() {
  U u;
  u.i = 10;
  writeln(u.i); // defined
  U u;
  u.f = 10;
  writeln(u.f); // defined  
  writeln(u.i); // undefined
}


I think this not because of endianeess problems, but because the compiler can keep values in registers and optimize away the read/write inside the union. D language can state this is defined, making unions a safer way to statically convert ints to floats, or it can follow the C way to make code a little faster.

Strict aliasing means that two objects of different types cannot refer to the same location in memory.

See also the -fno-strict-aliasing GCC compiler switch, and related matters:
>>In C99, it is illegal to create an alias of a different type than the original. This is often refered to as the strict aliasing rule.<<
I don't know if D here follows C99 or not.
http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html

Bye and thank you,
bearophile



More information about the Digitalmars-d mailing list