disabling unary "-" for unsigned types

Tue Feb 16 16:33:11 PST 2010

Steven Schveighoffer wrote:
> We're not working in Assembly here.  This is a high level language, 
> designed to hide the complexities of the underlying processor.  The 
> processor has no idea whether the data in its registers is signed or 
> unsigned.  The high level language does.  Please use that knowledge to 
> prevent stupid mistakes, or is that not one of the goals of the 
> compiler?  I can't believe this is such a hard point to get across.

It's not that I don't understand your point. I do, I just don't agree 
with it. At this point, we are going in circles so I don't think there's 
much value in me reiterating my opinions on it, except to say that 
Andrei and I once spent a great deal of time trying to separate signed 
from unsigned using the type system. The problem was that expressions 
tend to legitimately mix up signed and unsigned types together. Trying 
to tease out the "correct" sign of the result and what the programmer 
might have intended turned out to be an inscrutable mess of complication 
that we finally concluded would never work. It's a seductive idea, it 
just doesn't work. That's why C, etc. allows for easy implicit 
conversions between signed and unsigned, and why it has a set of 
(indeed, arbitrary) rules for combining them. Even though arbitrary, at 
least they are understandable and consistent.

Back when ANSI C was first finalized, there was a raging debate for 
years about whether C should use value-preserving or sign-preserving 
integral promotion rules. There were passionate arguments on both sides, 
both of which claimed the territory of intuitiveness and obviousness.

The end result was both sides eventually realized there was no correct 
answer, and that an arbitrary decision was required. It was made (value 
preserving), and half of the compiler vendors changed their compilers to 
match, and the rancor was forgotten.

For example, let's take two indices into an array, i and j:

     size_t i,j;

size_t is, by convention, unsigned.

Now, to get the distance between two indices:

    auto delta = i - j;

By C convention, delta is unsigned. If i is >= j, which may be an 
invariant of my algorithm, all is well. If i < j, suddenly delta is a 
very large value (but it still works, because of wrap around). The point 
is, there is no correct rule for dealing with the types of i-j. This has 
consequences:

Now, if j happens instead to be a complicated loop invariant expression 
(e) in a loop,

     loop
	auto delta = i - (e);

we may instead opt to hoist it out of a loop:

     auto j = -(e);
     loop
           auto delta = i + j;

and suddenly the compiler spits out error messages? Why can I subtract 
an unsigned, but not negate one? Such rules are complicated and will 
seem arbitrary to the user.

>>> The case I'm talking about is the equivalent to doing:
>>>  x = x / 0;
>>
>> Even mathematicians don't know what to do about divide by zero. But 
>> 2's complement arithmetic is well defined. So the situations are not 
>> comparable.
> 
> Sure they do, the result is infinity.  It's well defined.

I'm not a mathematician, but I believe it is not well defined, which one 
finds out when doing branch cuts. Per IEEE 754 (and required by D), 
floating point arithmetic divide by 0 resolves to infinity, but not all 
FPU hardware conforms to this spec. There is no similar convention for 
integer divide by 0. This is why the C standard leaves this as 
"implementation defined" behavior.