Treating the abusive unsigned syndrome

Thu Nov 27 08:49:02 PST 2008

Don wrote:
> Andrei Alexandrescu wrote:
>> Don wrote:
>>> Andrei Alexandrescu wrote:
>>>> One fear of mine is the reaction of throwing of hands in the air 
>>>> "how many integral types are enough???". However, if we're to judge 
>>>> by the addition of long long and a slew of typedefs to C99 and 
>>>> C++0x, the answer is "plenty". I'd be interested in gaging how 
>>>> people feel about adding two (bits64, bits32) or even four (bits64, 
>>>> bits32, bits16, and bits8) types as basic types. They'd be bitbags 
>>>> with undecided sign ready to be converted to their counterparts of 
>>>> decided sign.
>>>
>>> Here I think we have a fundamental disagreement: what is an 'unsigned 
>>> int'? There are two disparate ideas:
>>>
>>> (A) You think that it is an approximation to a natural number, ie, a 
>>> 'positive int'.
>>> (B) I think that it is a 'number with NO sign'; that is, the sign 
>>> depends on context. It may, for example, be part of a larger number. 
>>> Thus, I largely agree with the C behaviour -- once you have an 
>>> unsigned in a calculation, it's up to the programmer to provide an 
>>> interpretation.
>>>
>>> Unfortunately, the two concepts are mashed together in C-family 
>>> languages. (B) is the concept supported by the language typing rules, 
>>> but usage of (A) is widespread in practice.
>>
>> In fact we are in agreement. C tries to make it usable as both, and 
>> partially succeeds by having very lax conversions in all directions. 
>> This leads to the occasional puzzling behaviors. I do *want* uint to 
>> be an approximation of a natural number, while acknowledging that 
>> today it isn't much of that.
>>
>>> If we were going to introduce a slew of new types, I'd want them to 
>>> be for 'positive int'/'natural int', 'positive byte', etc.
>>>
>>> Natural int can always be implicitly converted to either int or uint, 
>>> with perfect safety. No other conversions are possible without a cast.
>>> Non-negative literals and manifest constants are naturals.
>>>
>>> The rules are:
>>> 1. Anything involving unsigned is unsigned, (same as C).
>>> 2. Else if it contains an integer, it is an integer.
>>> 3. (Now we know all quantities are natural):
>>> If it contains a subtraction, it is an integer [Probably allow 
>>> subtraction of compile-time quantities to remain natural, if the 
>>> values stay in range; flag an error if an overflow occurs].
>>> 4. Else it is a natural.
>>>
>>>
>>> The reason I think literals and manifest constants are so important 
>>> is that they are a significant fraction of the natural numbers in a 
>>> program.
>>>
>>> [Just before posting I've discovered that other people have posted 
>>> some similar ideas].
>>
>> That sounds encouraging. One problem is that your approach leaves the 
>> unsigned mess as it is, so although natural types are a nice addition, 
>> they don't bring a complete solution to the table.
>>
>>
>> Andrei
> 
> Well, it does make unsigned numbers (case (B)) quite obscure and 
> low-level. They could be renamed with uglier names to make this clearer.
> But since in this proposal there are no implicit conversions from uint 
> to anything, it's hard to do any damage with the unsigned type which 
> results.
> Basically, with any use of unsigned, the compiler says "I don't know if 
> this thing even has a meaningful sign!".
> 
> Alternatively, we could add rule 0: mixing int and unsigned is illegal. 
> But it's OK to mix natural with int, or natural with unsigned.
> I don't like this as much, since it would make most usage of unsigned 
> ugly; but maybe that's justified.

I think we're heading towards an impasse. We wouldn't want to make 
things much harder for systems-level programs that mix arithmetic and 
bit-level operations.

I'm glad there is interest and that quite a few ideas were brought up. 
Unfortunately, it looks like all have significant disadvantages.

One compromise solution Walter and I discussed in the past is to only 
sever one of the dangerous implicit conversions: int -> uint. Other than 
that, it's much like C (everything involving one unsigned is unsigned 
and unsigned -> signed is implicit) Let's see where that takes us.

(a) There are fewer situations when a small, reasonable number 
implicitly becomes a large, weird numnber.

(b) An exception to (a) is that u1 - u2 is also uint, and that's for the 
sake of C compatibility. I'd gladly drop it if I could and leave 
operations such as u1 - u2 return a signed number. That assumes the 
least and works with small, usual values.

(c) Unlike C, arithmetic and logical operations always return the 
tightest type possible, not a 32/64 bit value. For example, byte / int 
yields byte and so on.

What do you think?

Andrei