Treating the abusive unsigned syndrome

Russell Lewis webmaster at villagersonline.com
Tue Nov 25 06:26:52 PST 2008


I'm of the opinion that we should make mixed-sign operations a 
compile-time error.  I know that it would be annoying in some 
situations, but IMHO it gives you clearer, more reliable code.

IMHO, it's a mistake to have implicit casts that lose information.


Want to hear a funny/sad, but somewhat related story?  I was chasing 
down a segfault recently at work.  I hunted and hunted, and finally 
found out that the pointer returned from malloc() was bad.  I figured 
that I was overwriting the heap, right?  So I added tracing and 
debugging everywhere...no luck.

I finally, in desperation, included <stdlib.h> to the source file (there 
was a warning about malloc() not being prototyped)...and the segfaults 
vanished!!!

The problem was that the xlc compiler, when it doesn't have the 
prototype for a function, assumes that it returns int...but int is 32 
bits.  Moreover, the compiler was happily implicitly casting that int to 
a pointer...which was 64 bits.

The compiler was silently cropping the top 32 bits off my pointers.

And it all was a "feature" to make programming "easier."


Russ

Andrei Alexandrescu wrote:
> D pursues compatibility with C and C++ in the following manner: if a 
> code snippet compiles in both C and D or C++ and D, then it should have 
> the same semantics.
> 
> A classic problem with C and C++ integer arithmetic is that any 
> operation involving at least an unsigned integral receives automatically 
> an unsigned type, regardless of how silly that actually is, 
> semantically. About the only advantage of this rule is that it's simple. 
> IMHO it only has disadvantages from then on.
> 
> The following operations suffer from the "abusive unsigned syndrome" (u 
> is an unsigned integral, i is a signed integral):
> 
> (1) u + i, i + u
> (2) u - i, i - u
> (3) u - u
> (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C 
> requires that these all return unsigned, ouch)
> (5) u < i, i < u, u <= i etc. (all ordering comparisons)
> (6) -u
> 
> Logic operations &, |, and ^ also yield unsigned, but such cases are 
> less abusive because at least the operation wasn't arithmetic in the 
> first place. Comparing for equality is also quite a conundrum - should 
> minus two billion compare equal to 2_294_967_296? I'll ignore these for 
> now and focus on (1) - (6).
> 
> So far we haven't found a solid solution to this problem that at the 
> same time allows "good" code pass through, weeds out "bad" code, and is 
> compatible with C and C++. The closest I got was to have the compiler 
> define the following internal types:
> 
> __intuint
> __longulong
> 
> I've called them "dual-signed integers" in the past, but let's try the 
> shorter "undecided sign". Each of these is a subtype of both the signed 
> and the unsigned integral in its name, e.g. __intuint is a subtype of 
> both int and uint. (Originally I thought of defining __byteubyte and 
> __shortushort as well but dropped them in the interest of simplicity.)
> 
> The sign-ambiguous operations (1) - (6) yield __intuint if no operand 
> size was larger than 32 bits, and __longulong otherwise. Undecided sign 
> types define their own operations. Let x and y be values of undecided 
> sign. Then x + y, x - y, and -x also return a sign-ambiguous integral 
> (the size is that of the largest operand). However, the other operators 
> do not work on sign-ambiguous integrals, e.g. x / y would not compile 
> because you must decide what sign x and y should have prior to invoking 
> the operation. (Rationale: multiplication/division work differently 
> depending on the signedness of their operands).
> 
> User code cannot define a symbol of sign-ambiguous type, e.g.
> 
> auto a = u + i;
> 
> would not compile. However, given that __intuint is a subtype of both 
> int and uint, it can be freely converted to either whenever there's no 
> ambiguity:
> 
> int a = u + i; // fine
> uint b = u + i; // fine
> 
> The advantage of this scheme is that it weeds out many (most? all?) 
> surprises and oddities caused by the abusive unsigned rule of C and C++. 
> The disadvantage is that it is more complex and may surprise the novice 
> in its own way by refusing to compile code that looks legit.
> 
> At the moment, we're in limbo regarding the decision to go forward with 
> this. Walter, as many good long-time C programmers, knows the abusive 
> unsigned rule so well he's not hurt by it and consequently has little 
> incentive to see it as a problem. I have had to teach C and C++ to young 
> students coming from Java introductory courses and have a more 
> up-to-date perspective on the dangers. My strong belief is that we need 
> to address this mess somehow, which type inference will only make more 
> painful (in the hand of the beginner, auto can be a quite dangerous tool 
> for wrong belief propagation). I also know seasoned programmers who had 
> no idea that -u compiles and that it also oddly returns an unsigned type.
> 
> Your opinions, comments, and suggestions for improvements would as 
> always be welcome.
> 
> 
> Andrei



More information about the Digitalmars-d mailing list