Treating the abusive unsigned syndrome

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Tue Nov 25 07:59:01 PST 2008


D pursues compatibility with C and C++ in the following manner: if a 
code snippet compiles in both C and D or C++ and D, then it should have 
the same semantics.

A classic problem with C and C++ integer arithmetic is that any 
operation involving at least an unsigned integral receives automatically 
an unsigned type, regardless of how silly that actually is, 
semantically. About the only advantage of this rule is that it's simple. 
IMHO it only has disadvantages from then on.

The following operations suffer from the "abusive unsigned syndrome" (u 
is an unsigned integral, i is a signed integral):

(1) u + i, i + u
(2) u - i, i - u
(3) u - u
(4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C 
requires that these all return unsigned, ouch)
(5) u < i, i < u, u <= i etc. (all ordering comparisons)
(6) -u

Logic operations &, |, and ^ also yield unsigned, but such cases are 
less abusive because at least the operation wasn't arithmetic in the 
first place. Comparing for equality is also quite a conundrum - should 
minus two billion compare equal to 2_294_967_296? I'll ignore these for 
now and focus on (1) - (6).

So far we haven't found a solid solution to this problem that at the 
same time allows "good" code pass through, weeds out "bad" code, and is 
compatible with C and C++. The closest I got was to have the compiler 
define the following internal types:

__intuint
__longulong

I've called them "dual-signed integers" in the past, but let's try the 
shorter "undecided sign". Each of these is a subtype of both the signed 
and the unsigned integral in its name, e.g. __intuint is a subtype of 
both int and uint. (Originally I thought of defining __byteubyte and 
__shortushort as well but dropped them in the interest of simplicity.)

The sign-ambiguous operations (1) - (6) yield __intuint if no operand 
size was larger than 32 bits, and __longulong otherwise. Undecided sign 
types define their own operations. Let x and y be values of undecided 
sign. Then x + y, x - y, and -x also return a sign-ambiguous integral 
(the size is that of the largest operand). However, the other operators 
do not work on sign-ambiguous integrals, e.g. x / y would not compile 
because you must decide what sign x and y should have prior to invoking 
the operation. (Rationale: multiplication/division work differently 
depending on the signedness of their operands).

User code cannot define a symbol of sign-ambiguous type, e.g.

auto a = u + i;

would not compile. However, given that __intuint is a subtype of both 
int and uint, it can be freely converted to either whenever there's no 
ambiguity:

int a = u + i; // fine
uint b = u + i; // fine

The advantage of this scheme is that it weeds out many (most? all?) 
surprises and oddities caused by the abusive unsigned rule of C and C++. 
The disadvantage is that it is more complex and may surprise the novice 
in its own way by refusing to compile code that looks legit.

At the moment, we're in limbo regarding the decision to go forward with 
this. Walter, as many good long-time C programmers, knows the abusive 
unsigned rule so well he's not hurt by it and consequently has little 
incentive to see it as a problem. I have had to teach C and C++ to young 
students coming from Java introductory courses and have a more 
up-to-date perspective on the dangers. My strong belief is that we need 
to address this mess somehow, which type inference will only make more 
painful (in the hand of the beginner, auto can be a quite dangerous tool 
for wrong belief propagation). I also know seasoned programmers who had 
no idea that -u compiles and that it also oddly returns an unsigned type.

Your opinions, comments, and suggestions for improvements would as 
always be welcome.


Andrei



More information about the Digitalmars-d mailing list