Treating the abusive unsigned syndrome

Denis Koroskin 2korden at gmail.com
Tue Nov 25 08:13:43 PST 2008


On Tue, 25 Nov 2008 18:59:01 +0300, Andrei Alexandrescu  
<SeeWebsiteForEmail at erdani.org> wrote:

> D pursues compatibility with C and C++ in the following manner: if a  
> code snippet compiles in both C and D or C++ and D, then it should have  
> the same semantics.
>
> A classic problem with C and C++ integer arithmetic is that any  
> operation involving at least an unsigned integral receives automatically  
> an unsigned type, regardless of how silly that actually is,  
> semantically. About the only advantage of this rule is that it's simple.  
> IMHO it only has disadvantages from then on.
>
> The following operations suffer from the "abusive unsigned syndrome" (u  
> is an unsigned integral, i is a signed integral):
>
> (1) u + i, i + u
> (2) u - i, i - u
> (3) u - u
> (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C  
> requires that these all return unsigned, ouch)
> (5) u < i, i < u, u <= i etc. (all ordering comparisons)
> (6) -u
>
> Logic operations &, |, and ^ also yield unsigned, but such cases are  
> less abusive because at least the operation wasn't arithmetic in the  
> first place. Comparing for equality is also quite a conundrum - should  
> minus two billion compare equal to 2_294_967_296? I'll ignore these for  
> now and focus on (1) - (6).
>
> So far we haven't found a solid solution to this problem that at the  
> same time allows "good" code pass through, weeds out "bad" code, and is  
> compatible with C and C++. The closest I got was to have the compiler  
> define the following internal types:
>
> __intuint
> __longulong
>
> I've called them "dual-signed integers" in the past, but let's try the  
> shorter "undecided sign". Each of these is a subtype of both the signed  
> and the unsigned integral in its name, e.g. __intuint is a subtype of  
> both int and uint. (Originally I thought of defining __byteubyte and  
> __shortushort as well but dropped them in the interest of simplicity.)
>
> The sign-ambiguous operations (1) - (6) yield __intuint if no operand  
> size was larger than 32 bits, and __longulong otherwise. Undecided sign  
> types define their own operations. Let x and y be values of undecided  
> sign. Then x + y, x - y, and -x also return a sign-ambiguous integral  
> (the size is that of the largest operand). However, the other operators  
> do not work on sign-ambiguous integrals, e.g. x / y would not compile  
> because you must decide what sign x and y should have prior to invoking  
> the operation. (Rationale: multiplication/division work differently  
> depending on the signedness of their operands).
>
> User code cannot define a symbol of sign-ambiguous type, e.g.
>
> auto a = u + i;
>
> would not compile. However, given that __intuint is a subtype of both  
> int and uint, it can be freely converted to either whenever there's no  
> ambiguity:
>
> int a = u + i; // fine
> uint b = u + i; // fine
>
> The advantage of this scheme is that it weeds out many (most? all?)  
> surprises and oddities caused by the abusive unsigned rule of C and C++.  
> The disadvantage is that it is more complex and may surprise the novice  
> in its own way by refusing to compile code that looks legit.
>
> At the moment, we're in limbo regarding the decision to go forward with  
> this. Walter, as many good long-time C programmers, knows the abusive  
> unsigned rule so well he's not hurt by it and consequently has little  
> incentive to see it as a problem. I have had to teach C and C++ to young  
> students coming from Java introductory courses and have a more  
> up-to-date perspective on the dangers. My strong belief is that we need  
> to address this mess somehow, which type inference will only make more  
> painful (in the hand of the beginner, auto can be a quite dangerous tool  
> for wrong belief propagation). I also know seasoned programmers who had  
> no idea that -u compiles and that it also oddly returns an unsigned type.
>
> Your opinions, comments, and suggestions for improvements would as  
> always be welcome.
>
>
> Andrei

I think it's fine. That's the way the LLVM stores the integral values  
internally, IIRC.

But what is the type of -u? If it is undecided, then the following should  
compile:

uint u = 100;
uint s = -u; // undecided implicitly convertible to unsigned



More information about the Digitalmars-d mailing list