Treating the abusive unsigned syndrome
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Tue Nov 25 08:19:12 PST 2008
Denis Koroskin wrote:
> On Tue, 25 Nov 2008 18:59:01 +0300, Andrei Alexandrescu
> <SeeWebsiteForEmail at erdani.org> wrote:
>
>> D pursues compatibility with C and C++ in the following manner: if a
>> code snippet compiles in both C and D or C++ and D, then it should
>> have the same semantics.
>>
>> A classic problem with C and C++ integer arithmetic is that any
>> operation involving at least an unsigned integral receives
>> automatically an unsigned type, regardless of how silly that actually
>> is, semantically. About the only advantage of this rule is that it's
>> simple. IMHO it only has disadvantages from then on.
>>
>> The following operations suffer from the "abusive unsigned syndrome"
>> (u is an unsigned integral, i is a signed integral):
>>
>> (1) u + i, i + u
>> (2) u - i, i - u
>> (3) u - u
>> (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C
>> requires that these all return unsigned, ouch)
>> (5) u < i, i < u, u <= i etc. (all ordering comparisons)
>> (6) -u
>>
>> Logic operations &, |, and ^ also yield unsigned, but such cases are
>> less abusive because at least the operation wasn't arithmetic in the
>> first place. Comparing for equality is also quite a conundrum - should
>> minus two billion compare equal to 2_294_967_296? I'll ignore these
>> for now and focus on (1) - (6).
>>
>> So far we haven't found a solid solution to this problem that at the
>> same time allows "good" code pass through, weeds out "bad" code, and
>> is compatible with C and C++. The closest I got was to have the
>> compiler define the following internal types:
>>
>> __intuint
>> __longulong
>>
>> I've called them "dual-signed integers" in the past, but let's try the
>> shorter "undecided sign". Each of these is a subtype of both the
>> signed and the unsigned integral in its name, e.g. __intuint is a
>> subtype of both int and uint. (Originally I thought of defining
>> __byteubyte and __shortushort as well but dropped them in the interest
>> of simplicity.)
>>
>> The sign-ambiguous operations (1) - (6) yield __intuint if no operand
>> size was larger than 32 bits, and __longulong otherwise. Undecided
>> sign types define their own operations. Let x and y be values of
>> undecided sign. Then x + y, x - y, and -x also return a sign-ambiguous
>> integral (the size is that of the largest operand). However, the other
>> operators do not work on sign-ambiguous integrals, e.g. x / y would
>> not compile because you must decide what sign x and y should have
>> prior to invoking the operation. (Rationale: multiplication/division
>> work differently depending on the signedness of their operands).
>>
>> User code cannot define a symbol of sign-ambiguous type, e.g.
>>
>> auto a = u + i;
>>
>> would not compile. However, given that __intuint is a subtype of both
>> int and uint, it can be freely converted to either whenever there's no
>> ambiguity:
>>
>> int a = u + i; // fine
>> uint b = u + i; // fine
>>
>> The advantage of this scheme is that it weeds out many (most? all?)
>> surprises and oddities caused by the abusive unsigned rule of C and
>> C++. The disadvantage is that it is more complex and may surprise the
>> novice in its own way by refusing to compile code that looks legit.
>>
>> At the moment, we're in limbo regarding the decision to go forward
>> with this. Walter, as many good long-time C programmers, knows the
>> abusive unsigned rule so well he's not hurt by it and consequently has
>> little incentive to see it as a problem. I have had to teach C and C++
>> to young students coming from Java introductory courses and have a
>> more up-to-date perspective on the dangers. My strong belief is that
>> we need to address this mess somehow, which type inference will only
>> make more painful (in the hand of the beginner, auto can be a quite
>> dangerous tool for wrong belief propagation). I also know seasoned
>> programmers who had no idea that -u compiles and that it also oddly
>> returns an unsigned type.
>>
>> Your opinions, comments, and suggestions for improvements would as
>> always be welcome.
>>
>>
>> Andrei
>
> I think it's fine. That's the way the LLVM stores the integral values
> internally, IIRC.
>
> But what is the type of -u? If it is undecided, then the following
> should compile:
>
> uint u = 100;
> uint s = -u; // undecided implicitly convertible to unsigned
Yah, but at least you actively asked for an unsigned. Compare and
contrast with surprises such as:
uint a = 5;
writeln(-a); // this won't print -5
Such code would be disallowed in the undecided-sign regime.
Andrei
More information about the Digitalmars-d
mailing list