Treating the abusive unsigned syndrome
Russell Lewis
webmaster at villagersonline.com
Tue Nov 25 06:26:52 PST 2008
I'm of the opinion that we should make mixed-sign operations a
compile-time error. I know that it would be annoying in some
situations, but IMHO it gives you clearer, more reliable code.
IMHO, it's a mistake to have implicit casts that lose information.
Want to hear a funny/sad, but somewhat related story? I was chasing
down a segfault recently at work. I hunted and hunted, and finally
found out that the pointer returned from malloc() was bad. I figured
that I was overwriting the heap, right? So I added tracing and
debugging everywhere...no luck.
I finally, in desperation, included <stdlib.h> to the source file (there
was a warning about malloc() not being prototyped)...and the segfaults
vanished!!!
The problem was that the xlc compiler, when it doesn't have the
prototype for a function, assumes that it returns int...but int is 32
bits. Moreover, the compiler was happily implicitly casting that int to
a pointer...which was 64 bits.
The compiler was silently cropping the top 32 bits off my pointers.
And it all was a "feature" to make programming "easier."
Russ
Andrei Alexandrescu wrote:
> D pursues compatibility with C and C++ in the following manner: if a
> code snippet compiles in both C and D or C++ and D, then it should have
> the same semantics.
>
> A classic problem with C and C++ integer arithmetic is that any
> operation involving at least an unsigned integral receives automatically
> an unsigned type, regardless of how silly that actually is,
> semantically. About the only advantage of this rule is that it's simple.
> IMHO it only has disadvantages from then on.
>
> The following operations suffer from the "abusive unsigned syndrome" (u
> is an unsigned integral, i is a signed integral):
>
> (1) u + i, i + u
> (2) u - i, i - u
> (3) u - u
> (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C
> requires that these all return unsigned, ouch)
> (5) u < i, i < u, u <= i etc. (all ordering comparisons)
> (6) -u
>
> Logic operations &, |, and ^ also yield unsigned, but such cases are
> less abusive because at least the operation wasn't arithmetic in the
> first place. Comparing for equality is also quite a conundrum - should
> minus two billion compare equal to 2_294_967_296? I'll ignore these for
> now and focus on (1) - (6).
>
> So far we haven't found a solid solution to this problem that at the
> same time allows "good" code pass through, weeds out "bad" code, and is
> compatible with C and C++. The closest I got was to have the compiler
> define the following internal types:
>
> __intuint
> __longulong
>
> I've called them "dual-signed integers" in the past, but let's try the
> shorter "undecided sign". Each of these is a subtype of both the signed
> and the unsigned integral in its name, e.g. __intuint is a subtype of
> both int and uint. (Originally I thought of defining __byteubyte and
> __shortushort as well but dropped them in the interest of simplicity.)
>
> The sign-ambiguous operations (1) - (6) yield __intuint if no operand
> size was larger than 32 bits, and __longulong otherwise. Undecided sign
> types define their own operations. Let x and y be values of undecided
> sign. Then x + y, x - y, and -x also return a sign-ambiguous integral
> (the size is that of the largest operand). However, the other operators
> do not work on sign-ambiguous integrals, e.g. x / y would not compile
> because you must decide what sign x and y should have prior to invoking
> the operation. (Rationale: multiplication/division work differently
> depending on the signedness of their operands).
>
> User code cannot define a symbol of sign-ambiguous type, e.g.
>
> auto a = u + i;
>
> would not compile. However, given that __intuint is a subtype of both
> int and uint, it can be freely converted to either whenever there's no
> ambiguity:
>
> int a = u + i; // fine
> uint b = u + i; // fine
>
> The advantage of this scheme is that it weeds out many (most? all?)
> surprises and oddities caused by the abusive unsigned rule of C and C++.
> The disadvantage is that it is more complex and may surprise the novice
> in its own way by refusing to compile code that looks legit.
>
> At the moment, we're in limbo regarding the decision to go forward with
> this. Walter, as many good long-time C programmers, knows the abusive
> unsigned rule so well he's not hurt by it and consequently has little
> incentive to see it as a problem. I have had to teach C and C++ to young
> students coming from Java introductory courses and have a more
> up-to-date perspective on the dangers. My strong belief is that we need
> to address this mess somehow, which type inference will only make more
> painful (in the hand of the beginner, auto can be a quite dangerous tool
> for wrong belief propagation). I also know seasoned programmers who had
> no idea that -u compiles and that it also oddly returns an unsigned type.
>
> Your opinions, comments, and suggestions for improvements would as
> always be welcome.
>
>
> Andrei
More information about the Digitalmars-d
mailing list