Treating the abusive unsigned syndrome

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Tue Nov 25 08:19:12 PST 2008


Denis Koroskin wrote:
> On Tue, 25 Nov 2008 18:59:01 +0300, Andrei Alexandrescu 
> <SeeWebsiteForEmail at erdani.org> wrote:
> 
>> D pursues compatibility with C and C++ in the following manner: if a 
>> code snippet compiles in both C and D or C++ and D, then it should 
>> have the same semantics.
>>
>> A classic problem with C and C++ integer arithmetic is that any 
>> operation involving at least an unsigned integral receives 
>> automatically an unsigned type, regardless of how silly that actually 
>> is, semantically. About the only advantage of this rule is that it's 
>> simple. IMHO it only has disadvantages from then on.
>>
>> The following operations suffer from the "abusive unsigned syndrome" 
>> (u is an unsigned integral, i is a signed integral):
>>
>> (1) u + i, i + u
>> (2) u - i, i - u
>> (3) u - u
>> (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C 
>> requires that these all return unsigned, ouch)
>> (5) u < i, i < u, u <= i etc. (all ordering comparisons)
>> (6) -u
>>
>> Logic operations &, |, and ^ also yield unsigned, but such cases are 
>> less abusive because at least the operation wasn't arithmetic in the 
>> first place. Comparing for equality is also quite a conundrum - should 
>> minus two billion compare equal to 2_294_967_296? I'll ignore these 
>> for now and focus on (1) - (6).
>>
>> So far we haven't found a solid solution to this problem that at the 
>> same time allows "good" code pass through, weeds out "bad" code, and 
>> is compatible with C and C++. The closest I got was to have the 
>> compiler define the following internal types:
>>
>> __intuint
>> __longulong
>>
>> I've called them "dual-signed integers" in the past, but let's try the 
>> shorter "undecided sign". Each of these is a subtype of both the 
>> signed and the unsigned integral in its name, e.g. __intuint is a 
>> subtype of both int and uint. (Originally I thought of defining 
>> __byteubyte and __shortushort as well but dropped them in the interest 
>> of simplicity.)
>>
>> The sign-ambiguous operations (1) - (6) yield __intuint if no operand 
>> size was larger than 32 bits, and __longulong otherwise. Undecided 
>> sign types define their own operations. Let x and y be values of 
>> undecided sign. Then x + y, x - y, and -x also return a sign-ambiguous 
>> integral (the size is that of the largest operand). However, the other 
>> operators do not work on sign-ambiguous integrals, e.g. x / y would 
>> not compile because you must decide what sign x and y should have 
>> prior to invoking the operation. (Rationale: multiplication/division 
>> work differently depending on the signedness of their operands).
>>
>> User code cannot define a symbol of sign-ambiguous type, e.g.
>>
>> auto a = u + i;
>>
>> would not compile. However, given that __intuint is a subtype of both 
>> int and uint, it can be freely converted to either whenever there's no 
>> ambiguity:
>>
>> int a = u + i; // fine
>> uint b = u + i; // fine
>>
>> The advantage of this scheme is that it weeds out many (most? all?) 
>> surprises and oddities caused by the abusive unsigned rule of C and 
>> C++. The disadvantage is that it is more complex and may surprise the 
>> novice in its own way by refusing to compile code that looks legit.
>>
>> At the moment, we're in limbo regarding the decision to go forward 
>> with this. Walter, as many good long-time C programmers, knows the 
>> abusive unsigned rule so well he's not hurt by it and consequently has 
>> little incentive to see it as a problem. I have had to teach C and C++ 
>> to young students coming from Java introductory courses and have a 
>> more up-to-date perspective on the dangers. My strong belief is that 
>> we need to address this mess somehow, which type inference will only 
>> make more painful (in the hand of the beginner, auto can be a quite 
>> dangerous tool for wrong belief propagation). I also know seasoned 
>> programmers who had no idea that -u compiles and that it also oddly 
>> returns an unsigned type.
>>
>> Your opinions, comments, and suggestions for improvements would as 
>> always be welcome.
>>
>>
>> Andrei
> 
> I think it's fine. That's the way the LLVM stores the integral values 
> internally, IIRC.
> 
> But what is the type of -u? If it is undecided, then the following 
> should compile:
> 
> uint u = 100;
> uint s = -u; // undecided implicitly convertible to unsigned

Yah, but at least you actively asked for an unsigned. Compare and 
contrast with surprises such as:

uint a = 5;
writeln(-a); // this won't print -5

Such code would be disallowed in the undecided-sign regime.


Andrei



More information about the Digitalmars-d mailing list