Treating the abusive unsigned syndrome

Fri Nov 28 04:42:42 PST 2008

Andrei Alexandrescu wrote:
> KennyTM~ wrote:
>> KennyTM~ wrote:
>>> Andrei Alexandrescu wrote:
>>>> Don wrote:
>>>>> Andrei Alexandrescu wrote:
>>>>>> Don wrote:
>>>>>>> Andrei Alexandrescu wrote:
>>>>>>>> One fear of mine is the reaction of throwing of hands in the air 
>>>>>>>> "how many integral types are enough???". However, if we're to 
>>>>>>>> judge by the addition of long long and a slew of typedefs to C99 
>>>>>>>> and C++0x, the answer is "plenty". I'd be interested in gaging 
>>>>>>>> how people feel about adding two (bits64, bits32) or even four 
>>>>>>>> (bits64, bits32, bits16, and bits8) types as basic types. They'd 
>>>>>>>> be bitbags with undecided sign ready to be converted to their 
>>>>>>>> counterparts of decided sign.
>>>>>>>
>>>>>>> Here I think we have a fundamental disagreement: what is an 
>>>>>>> 'unsigned int'? There are two disparate ideas:
>>>>>>>
>>>>>>> (A) You think that it is an approximation to a natural number, 
>>>>>>> ie, a 'positive int'.
>>>>>>> (B) I think that it is a 'number with NO sign'; that is, the sign 
>>>>>>> depends on context. It may, for example, be part of a larger 
>>>>>>> number. Thus, I largely agree with the C behaviour -- once you 
>>>>>>> have an unsigned in a calculation, it's up to the programmer to 
>>>>>>> provide an interpretation.
>>>>>>>
>>>>>>> Unfortunately, the two concepts are mashed together in C-family 
>>>>>>> languages. (B) is the concept supported by the language typing 
>>>>>>> rules, but usage of (A) is widespread in practice.
>>>>>>
>>>>>> In fact we are in agreement. C tries to make it usable as both, 
>>>>>> and partially succeeds by having very lax conversions in all 
>>>>>> directions. This leads to the occasional puzzling behaviors. I do 
>>>>>> *want* uint to be an approximation of a natural number, while 
>>>>>> acknowledging that today it isn't much of that.
>>>>>>
>>>>>>> If we were going to introduce a slew of new types, I'd want them 
>>>>>>> to be for 'positive int'/'natural int', 'positive byte', etc.
>>>>>>>
>>>>>>> Natural int can always be implicitly converted to either int or 
>>>>>>> uint, with perfect safety. No other conversions are possible 
>>>>>>> without a cast.
>>>>>>> Non-negative literals and manifest constants are naturals.
>>>>>>>
>>>>>>> The rules are:
>>>>>>> 1. Anything involving unsigned is unsigned, (same as C).
>>>>>>> 2. Else if it contains an integer, it is an integer.
>>>>>>> 3. (Now we know all quantities are natural):
>>>>>>> If it contains a subtraction, it is an integer [Probably allow 
>>>>>>> subtraction of compile-time quantities to remain natural, if the 
>>>>>>> values stay in range; flag an error if an overflow occurs].
>>>>>>> 4. Else it is a natural.
>>>>>>>
>>>>>>>
>>>>>>> The reason I think literals and manifest constants are so 
>>>>>>> important is that they are a significant fraction of the natural 
>>>>>>> numbers in a program.
>>>>>>>
>>>>>>> [Just before posting I've discovered that other people have 
>>>>>>> posted some similar ideas].
>>>>>>
>>>>>> That sounds encouraging. One problem is that your approach leaves 
>>>>>> the unsigned mess as it is, so although natural types are a nice 
>>>>>> addition, they don't bring a complete solution to the table.
>>>>>>
>>>>>>
>>>>>> Andrei
>>>>>
>>>>> Well, it does make unsigned numbers (case (B)) quite obscure and 
>>>>> low-level. They could be renamed with uglier names to make this 
>>>>> clearer.
>>>>> But since in this proposal there are no implicit conversions from 
>>>>> uint to anything, it's hard to do any damage with the unsigned type 
>>>>> which results.
>>>>> Basically, with any use of unsigned, the compiler says "I don't 
>>>>> know if this thing even has a meaningful sign!".
>>>>>
>>>>> Alternatively, we could add rule 0: mixing int and unsigned is 
>>>>> illegal. But it's OK to mix natural with int, or natural with 
>>>>> unsigned.
>>>>> I don't like this as much, since it would make most usage of 
>>>>> unsigned ugly; but maybe that's justified.
>>>>
>>>> I think we're heading towards an impasse. We wouldn't want to make 
>>>> things much harder for systems-level programs that mix arithmetic 
>>>> and bit-level operations.
>>>>
>>>> I'm glad there is interest and that quite a few ideas were brought 
>>>> up. Unfortunately, it looks like all have significant disadvantages.
>>>>
>>>> One compromise solution Walter and I discussed in the past is to 
>>>> only sever one of the dangerous implicit conversions: int -> uint. 
>>>> Other than that, it's much like C (everything involving one unsigned 
>>>> is unsigned and unsigned -> signed is implicit) Let's see where that 
>>>> takes us.
>>>>
>>>> (a) There are fewer situations when a small, reasonable number 
>>>> implicitly becomes a large, weird numnber.
>>>>
>>>> (b) An exception to (a) is that u1 - u2 is also uint, and that's for 
>>>> the sake of C compatibility. I'd gladly drop it if I could and leave 
>>>> operations such as u1 - u2 return a signed number. That assumes the 
>>>> least and works with small, usual values.

The problem with that, is that you're then forcing the 'unsigned is a 
natural' interpretation when it may be erroneous.

uint.max - 10 is a uint.

It's an interesting case, because int = u1 - u2 is definitely incorrect 
when u1 > int.max.

uint = u1 - u2 may be incorrect when u1 < u2, _if you think of unsigned 
as a positive number_.
But, if you think of it as a natural modulo 2^32, uint = u1-u2 is always 
correct, since that's what's happening mathematically.

I'm strongly of the opinion that you shouldn't be able to generate an 
unsigned accidentally -- you should need to either declare a type as 
uint, or use the 'u' suffix on a literal.
Right now, properties like 'length' being uint means you get too many 
surprising uints, especially when using 'auto'.

I take your point about not wanting to give up the full 32 bits of 
address space. The problem is, that if you have an object x which is 
 >2GB, and a small object y, then  x.length - y.length will erroneously 
be negative. If we want code (especially in libraries) to cope with such 
large objects, we need to ensure that any time there's a subtraction 
involving a length, the first is larger than the second. I think that 
would preclude the combination:

length is uint
byte[].length can exceed 2GB, and code is correct when it does
uint - uint is an int (or even, can implicitly convert to int)

As far as I can tell, at least one of these has to go.