GCC Undefined Behavior Sanitizer

via Digitalmars-d digitalmars-d at puremagic.com
Fri Oct 17 02:46:48 PDT 2014


On Friday, 17 October 2014 at 08:38:12 UTC, Paulo  Pinto wrote:
> As an outsider, I think D would be better by having only 
> defined behaviors.

Actually, this is the first thing I would change about D and make 
it less dependent on x86. I think a system level language should 
enable max optimization on basic types and rather inject 
integrity tests for debugging/testing or support debug-exceptions 
where available.

The second thing I would change is to make whole program analysis 
mandatory so that you can deduce and constrain value ranges. I 
don't believe the argument about separate compilation and 
commercial needs (and even then augmented object code is a 
distinct possibility). Even FFI is not a great argument, you 
should be able to specify what can happen in a foreign function.

It is just plain wrong to let integers wrap by default in an 
accessible result. That is not integer behaviour.  The correct 
thing to do is to inject overflow checks in debug mode and let 
overflow in results (that are accessed) be undefined. Otherwise 
you end up giving the compiler a difficult job:

uint y=x+1;
if (x < y){…}

Should be optimized to:

{…}

In D (and C++) you would get:

if (x < ((x+1)&0xffffffff)){…}

As a result you are encouraged to use signed int everywhere in 
C++, since unsigned ints use modulo-arithmetic. Unsigned ints in 
C++ are only meant for bit-field stuff. And the C++ designers 
admit that the C++ library is ill-specified because it uses 
unsigned ints for integers that cannot be negative, while that is 
now considered a bad practice…

In D it is even worse since you are forced to use a fixed size 
modulo even for int, so you cannot do 32 bit arithmetic in a 64 
bit register without getting extra modulo operations.

So, "undefined behaviour" is not so bad, as long as you qualify 
it. You could for instance say that overflow on ints leads to an 
unknown value, but no other side effects. That was probably the 
original intent for C, but compiler writers have taken it a step 
further…

D has locked itself to Pentium-style x86 behaviour. Unfortunately 
it is very difficult to have everything be well-defined in a low 
level programming language. It isn't even obvious that a byte 
should be 8 bits, although the investments in creating UTF-8 
resources on the Internet probably has locked us to it for the 
next 100 years… :)



More information about the Digitalmars-d mailing list