UB in D

Sat Jul 9 16:44:07 PDT 2016

On Sat, Jul 09, 2016 at 07:17:59PM -0400, Andrei Alexandrescu via Digitalmars-d wrote:
> On 07/09/2016 06:36 PM, Timon Gehr wrote:
> > Undefined behaviour means the language semantics don't define a
> > successor state for a computation that has not terminated. Do you
> > agree with that definition? If not, what /is/ UB in D, and why is it
> > called UB?
> 
> Yah, I was joking with Walter that effectively the moment you define
> undefined behavior it's not undefined any longer :o). It happens to
> the best of us. I think we're all aligned here.
> 
> There's some interesting interaction here. Consider:
> 
> int fun(int x)
> {
>     int[10] y;
>     ...
>     return ++y[9 >> x];
> }
> 
> Now, under the "shift by negative numbers is undefined" rule, the
> compiler is free to eliminate the bounds check from the indexing
> because it's always within bounds for all defined programs. If it
> isn't, memory corruption may ensue. However, if the compiler says
> "shift by negative numbers is implementation-specified", the the
> compiler cannot portably eliminate the bounds check.

I find this rather disturbing, actually.  There is a fine line between
taking advantage of assert's to elide stuff that the programmer promises
will not happen, and eliding something that's defined to be UB and
thereby resulting in memory corruption.

In the above example, I'd be OK with the compiler eliding the bounds
check if there an assert(x >= 0) either in the function body or in the
in-contract.  Having the compiler elide the bounds check without any
assert or any other indication that the programmer has made assurances
that UB won't occur is very scary to me, as plain ole carelessness can
easily lead to exploitable security holes.  I hope D doesn't become an
example of this kind of security hole.

At the very least, I'd expect the compiler to warn that the function
argument may cause UB, and suggest that an in-contract or assert be
added.

On a more technical note, I think eliding the bounds check on the
grounds that shifting by negative x is UB is based on a fallacy. Eliding
a bounds check should only be done when the compiler has the assurance
that the bounds check is not needed. Just because a particular construct
is UB does not meet this condition, because, being UB, there is no way
to tell if the bounds check is needed or not, therefore the correct
behaviour IMO is to leave the bounds check in. The elision should only
happen if the compiler is assured that it's actually not needed.

To elide simply because negative x is UB basically amounts to saying
"the programmer ought to know better than writing UB code, so therefore
let's just assume that the programmer never makes a mistake and barge
ahead fearlessly FTW!". We all know where blind trust in programmer
reliability leads: security holes galore because humans make mistakes.
Assuming humans don't make mistakes, which is what this kind of
exploitation of UB essentially boils down to, leads to madness.

> It's a nice example illustrating how things that seem to have nothing
> with memory corruption do effect it.
[...]

T

-- 
Stop staring at me like that! It's offens... no, you'll hurt your eyes!