Treating the abusive unsigned syndrome

Thu Nov 27 13:31:11 PST 2008

KennyTM~ wrote:
> Andrei Alexandrescu wrote:
>> Don wrote:
>>> Andrei Alexandrescu wrote:
>>>> Don wrote:
>>>>> Andrei Alexandrescu wrote:
>>>>>> One fear of mine is the reaction of throwing of hands in the air 
>>>>>> "how many integral types are enough???". However, if we're to 
>>>>>> judge by the addition of long long and a slew of typedefs to C99 
>>>>>> and C++0x, the answer is "plenty". I'd be interested in gaging how 
>>>>>> people feel about adding two (bits64, bits32) or even four 
>>>>>> (bits64, bits32, bits16, and bits8) types as basic types. They'd 
>>>>>> be bitbags with undecided sign ready to be converted to their 
>>>>>> counterparts of decided sign.
>>>>>
>>>>> Here I think we have a fundamental disagreement: what is an 
>>>>> 'unsigned int'? There are two disparate ideas:
>>>>>
>>>>> (A) You think that it is an approximation to a natural number, ie, 
>>>>> a 'positive int'.
>>>>> (B) I think that it is a 'number with NO sign'; that is, the sign 
>>>>> depends on context. It may, for example, be part of a larger 
>>>>> number. Thus, I largely agree with the C behaviour -- once you have 
>>>>> an unsigned in a calculation, it's up to the programmer to provide 
>>>>> an interpretation.
>>>>>
>>>>> Unfortunately, the two concepts are mashed together in C-family 
>>>>> languages. (B) is the concept supported by the language typing 
>>>>> rules, but usage of (A) is widespread in practice.
>>>>
>>>> In fact we are in agreement. C tries to make it usable as both, and 
>>>> partially succeeds by having very lax conversions in all directions. 
>>>> This leads to the occasional puzzling behaviors. I do *want* uint to 
>>>> be an approximation of a natural number, while acknowledging that 
>>>> today it isn't much of that.
>>>>
>>>>> If we were going to introduce a slew of new types, I'd want them to 
>>>>> be for 'positive int'/'natural int', 'positive byte', etc.
>>>>>
>>>>> Natural int can always be implicitly converted to either int or 
>>>>> uint, with perfect safety. No other conversions are possible 
>>>>> without a cast.
>>>>> Non-negative literals and manifest constants are naturals.
>>>>>
>>>>> The rules are:
>>>>> 1. Anything involving unsigned is unsigned, (same as C).
>>>>> 2. Else if it contains an integer, it is an integer.
>>>>> 3. (Now we know all quantities are natural):
>>>>> If it contains a subtraction, it is an integer [Probably allow 
>>>>> subtraction of compile-time quantities to remain natural, if the 
>>>>> values stay in range; flag an error if an overflow occurs].
>>>>> 4. Else it is a natural.
>>>>>
>>>>>
>>>>> The reason I think literals and manifest constants are so important 
>>>>> is that they are a significant fraction of the natural numbers in a 
>>>>> program.
>>>>>
>>>>> [Just before posting I've discovered that other people have posted 
>>>>> some similar ideas].
>>>>
>>>> That sounds encouraging. One problem is that your approach leaves 
>>>> the unsigned mess as it is, so although natural types are a nice 
>>>> addition, they don't bring a complete solution to the table.
>>>>
>>>>
>>>> Andrei
>>>
>>> Well, it does make unsigned numbers (case (B)) quite obscure and 
>>> low-level. They could be renamed with uglier names to make this clearer.
>>> But since in this proposal there are no implicit conversions from 
>>> uint to anything, it's hard to do any damage with the unsigned type 
>>> which results.
>>> Basically, with any use of unsigned, the compiler says "I don't know 
>>> if this thing even has a meaningful sign!".
>>>
>>> Alternatively, we could add rule 0: mixing int and unsigned is 
>>> illegal. But it's OK to mix natural with int, or natural with unsigned.
>>> I don't like this as much, since it would make most usage of unsigned 
>>> ugly; but maybe that's justified.
>>
>> I think we're heading towards an impasse. We wouldn't want to make 
>> things much harder for systems-level programs that mix arithmetic and 
>> bit-level operations.
>>
>> I'm glad there is interest and that quite a few ideas were brought up. 
>> Unfortunately, it looks like all have significant disadvantages.
>>
>> One compromise solution Walter and I discussed in the past is to only 
>> sever one of the dangerous implicit conversions: int -> uint. Other 
>> than that, it's much like C (everything involving one unsigned is 
>> unsigned and unsigned -> signed is implicit) Let's see where that 
>> takes us.
>>
>> (a) There are fewer situations when a small, reasonable number 
>> implicitly becomes a large, weird numnber.
>>
>> (b) An exception to (a) is that u1 - u2 is also uint, and that's for 
>> the sake of C compatibility. I'd gladly drop it if I could and leave 
>> operations such as u1 - u2 return a signed number. That assumes the 
>> least and works with small, usual values.
>>
>> (c) Unlike C, arithmetic and logical operations always return the 
>> tightest type possible, not a 32/64 bit value. For example, byte / int 
>> yields byte and so on.
>>
> 
> So you mean long * int (e.g. 1234567890123L * 2) will return an int 
> instead of a long?!
> 
> The opposite sounds more natural to me.
> 

Em, or do you mean the tightest type that can represent all possible 
results? (so long*int == cent?)

>> What do you think?
>>
>>
>> Andrei