Treating the abusive unsigned syndrome

Thu Nov 27 13:24:24 PST 2008

Andrei Alexandrescu wrote:
> Don wrote:
>> Andrei Alexandrescu wrote:
>>> Don wrote:
>>>> Andrei Alexandrescu wrote:
>>>>> One fear of mine is the reaction of throwing of hands in the air 
>>>>> "how many integral types are enough???". However, if we're to judge 
>>>>> by the addition of long long and a slew of typedefs to C99 and 
>>>>> C++0x, the answer is "plenty". I'd be interested in gaging how 
>>>>> people feel about adding two (bits64, bits32) or even four (bits64, 
>>>>> bits32, bits16, and bits8) types as basic types. They'd be bitbags 
>>>>> with undecided sign ready to be converted to their counterparts of 
>>>>> decided sign.
>>>>
>>>> Here I think we have a fundamental disagreement: what is an 
>>>> 'unsigned int'? There are two disparate ideas:
>>>>
>>>> (A) You think that it is an approximation to a natural number, ie, a 
>>>> 'positive int'.
>>>> (B) I think that it is a 'number with NO sign'; that is, the sign 
>>>> depends on context. It may, for example, be part of a larger number. 
>>>> Thus, I largely agree with the C behaviour -- once you have an 
>>>> unsigned in a calculation, it's up to the programmer to provide an 
>>>> interpretation.
>>>>
>>>> Unfortunately, the two concepts are mashed together in C-family 
>>>> languages. (B) is the concept supported by the language typing 
>>>> rules, but usage of (A) is widespread in practice.
>>>
>>> In fact we are in agreement. C tries to make it usable as both, and 
>>> partially succeeds by having very lax conversions in all directions. 
>>> This leads to the occasional puzzling behaviors. I do *want* uint to 
>>> be an approximation of a natural number, while acknowledging that 
>>> today it isn't much of that.
>>>
>>>> If we were going to introduce a slew of new types, I'd want them to 
>>>> be for 'positive int'/'natural int', 'positive byte', etc.
>>>>
>>>> Natural int can always be implicitly converted to either int or 
>>>> uint, with perfect safety. No other conversions are possible without 
>>>> a cast.
>>>> Non-negative literals and manifest constants are naturals.
>>>>
>>>> The rules are:
>>>> 1. Anything involving unsigned is unsigned, (same as C).
>>>> 2. Else if it contains an integer, it is an integer.
>>>> 3. (Now we know all quantities are natural):
>>>> If it contains a subtraction, it is an integer [Probably allow 
>>>> subtraction of compile-time quantities to remain natural, if the 
>>>> values stay in range; flag an error if an overflow occurs].
>>>> 4. Else it is a natural.
>>>>
>>>>
>>>> The reason I think literals and manifest constants are so important 
>>>> is that they are a significant fraction of the natural numbers in a 
>>>> program.
>>>>
>>>> [Just before posting I've discovered that other people have posted 
>>>> some similar ideas].
>>>
>>> That sounds encouraging. One problem is that your approach leaves the 
>>> unsigned mess as it is, so although natural types are a nice 
>>> addition, they don't bring a complete solution to the table.
>>>
>>>
>>> Andrei
>>
>> Well, it does make unsigned numbers (case (B)) quite obscure and 
>> low-level. They could be renamed with uglier names to make this clearer.
>> But since in this proposal there are no implicit conversions from uint 
>> to anything, it's hard to do any damage with the unsigned type which 
>> results.
>> Basically, with any use of unsigned, the compiler says "I don't know 
>> if this thing even has a meaningful sign!".
>>
>> Alternatively, we could add rule 0: mixing int and unsigned is 
>> illegal. But it's OK to mix natural with int, or natural with unsigned.
>> I don't like this as much, since it would make most usage of unsigned 
>> ugly; but maybe that's justified.
> 
> I think we're heading towards an impasse. We wouldn't want to make 
> things much harder for systems-level programs that mix arithmetic and 
> bit-level operations.
> 
> I'm glad there is interest and that quite a few ideas were brought up. 
> Unfortunately, it looks like all have significant disadvantages.
> 
> One compromise solution Walter and I discussed in the past is to only 
> sever one of the dangerous implicit conversions: int -> uint. Other than 
> that, it's much like C (everything involving one unsigned is unsigned 
> and unsigned -> signed is implicit) Let's see where that takes us.
> 
> (a) There are fewer situations when a small, reasonable number 
> implicitly becomes a large, weird numnber.
> 
> (b) An exception to (a) is that u1 - u2 is also uint, and that's for the 
> sake of C compatibility. I'd gladly drop it if I could and leave 
> operations such as u1 - u2 return a signed number. That assumes the 
> least and works with small, usual values.
> 
> (c) Unlike C, arithmetic and logical operations always return the 
> tightest type possible, not a 32/64 bit value. For example, byte / int 
> yields byte and so on.
> 

So you mean long * int (e.g. 1234567890123L * 2) will return an int 
instead of a long?!

The opposite sounds more natural to me.

> What do you think?
> 
> 
> Andrei