Treating the abusive unsigned syndrome

Fri Nov 28 08:44:39 PST 2008

Andrei Alexandrescu wrote:
> Don wrote:
>> Andrei Alexandrescu wrote:
>>> (I lost track of quotes, so I yanked them all beyond Don's message.)
>>>
>>> Don wrote:
>>>> The problem with that, is that you're then forcing the 'unsigned is 
>>>> a natural' interpretation when it may be erroneous.
>>>>
>>>> uint.max - 10 is a uint.
>>>>
>>>> It's an interesting case, because int = u1 - u2 is definitely 
>>>> incorrect when u1 > int.max.
>>>>
>>>> uint = u1 - u2 may be incorrect when u1 < u2, _if you think of 
>>>> unsigned as a positive number_.
>>>> But, if you think of it as a natural modulo 2^32, uint = u1-u2 is 
>>>> always correct, since that's what's happening mathematically.
>>>
>>> Sounds good. One important consideration is that modulo arithmetic is 
>>> considerably easier to understand when two's complement and signs are 
>>> not involved.
>>>
>>>> I'm strongly of the opinion that you shouldn't be able to generate 
>>>> an unsigned accidentally -- you should need to either declare a type 
>>>> as uint, or use the 'u' suffix on a literal.
>>>> Right now, properties like 'length' being uint means you get too 
>>>> many surprising uints, especially when using 'auto'.
>>>
>>> I am not surprised by length being unsigned. I'm also not surprised 
>>> by hexadecimal constants being unsigned. (They are unsigned in C. 
>>> Walter made them signed or not, depending on their value.)
>>>
>>>> I take your point about not wanting to give up the full 32 bits of 
>>>> address space. The problem is, that if you have an object x which is 
>>>>  >2GB, and a small object y, then  x.length - y.length will 
>>>> erroneously be negative. If we want code (especially in libraries) 
>>>> to cope with such large objects, we need to ensure that any time 
>>>> there's a subtraction involving a length, the first is larger than 
>>>> the second. I think that would preclude the combination:
>>>>
>>>> length is uint
>>>> byte[].length can exceed 2GB, and code is correct when it does
>>>> uint - uint is an int (or even, can implicitly convert to int)
>>>>
>>>> As far as I can tell, at least one of these has to go.
>>>
>>> Well none has to go in the latest design:
>>>
>>> (a) One unsigned makes everything unsigned
>>>
>>> (b) unsigned -> signed is allowed
>>>
>>> (c) signed -> unsigned is disallowed
>>>
>>> Of course the latest design has imperfections, but precludes neither 
>>> of the three things you mention.
>>
>> It's close, but how can code such as:
>>
>> if (x.length - y.length < 100) ...
>>
>> be correct in the presence of length > 2GB?
>>
>> since
>> (a) x.length  = uint.max, y.length = 1
>> (b) x.length = 4, y.length = 2
>> both produce the same binary result (0xFFFF_FFFE = -2)
> 
> (You mean x.length = 2, y.length = 4 in the second case.)

Yes.

> 
>> Any subtraction of two lengths has a possible range of
>>  -int.max .. uint.max
>> which is quite problematic (and the root cause of the problems, I guess).
>> And unfortunately I think code is riddled with subtraction of lengths.
> 
> Code may be riddled with subtraction of lengths, but seems to be working 
> with today's rule that the result of that subtraction is unsigned. So 
> definitely we're not introducing new problems.

Yes. I think much existing code would fail with sizes over 2GB, though. 
But it's not any worse.

> 
> I agree the solution has problems. Following this thread that in turn 
> follows my sleepless nights poring over the subject, I'm glad to reach a 
> design that is better than what we currently have. I think that 
> disallowing the signed -> unsigned conversions will be a net improvement.

I agree. And dealing with compile-time constants will improve things 
even more.