Treating the abusive unsigned syndrome

Fri Nov 28 08:29:41 PST 2008

Don wrote:
> Andrei Alexandrescu wrote:
>> (I lost track of quotes, so I yanked them all beyond Don's message.)
>>
>> Don wrote:
>>> The problem with that, is that you're then forcing the 'unsigned is a 
>>> natural' interpretation when it may be erroneous.
>>>
>>> uint.max - 10 is a uint.
>>>
>>> It's an interesting case, because int = u1 - u2 is definitely 
>>> incorrect when u1 > int.max.
>>>
>>> uint = u1 - u2 may be incorrect when u1 < u2, _if you think of 
>>> unsigned as a positive number_.
>>> But, if you think of it as a natural modulo 2^32, uint = u1-u2 is 
>>> always correct, since that's what's happening mathematically.
>>
>> Sounds good. One important consideration is that modulo arithmetic is 
>> considerably easier to understand when two's complement and signs are 
>> not involved.
>>
>>> I'm strongly of the opinion that you shouldn't be able to generate an 
>>> unsigned accidentally -- you should need to either declare a type as 
>>> uint, or use the 'u' suffix on a literal.
>>> Right now, properties like 'length' being uint means you get too many 
>>> surprising uints, especially when using 'auto'.
>>
>> I am not surprised by length being unsigned. I'm also not surprised by 
>> hexadecimal constants being unsigned. (They are unsigned in C. Walter 
>> made them signed or not, depending on their value.)
>>
>>> I take your point about not wanting to give up the full 32 bits of 
>>> address space. The problem is, that if you have an object x which is 
>>>  >2GB, and a small object y, then  x.length - y.length will 
>>> erroneously be negative. If we want code (especially in libraries) to 
>>> cope with such large objects, we need to ensure that any time there's 
>>> a subtraction involving a length, the first is larger than the 
>>> second. I think that would preclude the combination:
>>>
>>> length is uint
>>> byte[].length can exceed 2GB, and code is correct when it does
>>> uint - uint is an int (or even, can implicitly convert to int)
>>>
>>> As far as I can tell, at least one of these has to go.
>>
>> Well none has to go in the latest design:
>>
>> (a) One unsigned makes everything unsigned
>>
>> (b) unsigned -> signed is allowed
>>
>> (c) signed -> unsigned is disallowed
>>
>> Of course the latest design has imperfections, but precludes neither 
>> of the three things you mention.
> 
> It's close, but how can code such as:
> 
> if (x.length - y.length < 100) ...
> 
> be correct in the presence of length > 2GB?
> 
> since
> (a) x.length  = uint.max, y.length = 1
> (b) x.length = 4, y.length = 2
> both produce the same binary result (0xFFFF_FFFE = -2)

(You mean x.length = 2, y.length = 4 in the second case.)

> Any subtraction of two lengths has a possible range of
>  -int.max .. uint.max
> which is quite problematic (and the root cause of the problems, I guess).
> And unfortunately I think code is riddled with subtraction of lengths.

Code may be riddled with subtraction of lengths, but seems to be working 
with today's rule that the result of that subtraction is unsigned. So 
definitely we're not introducing new problems.

I agree the solution has problems. Following this thread that in turn 
follows my sleepless nights poring over the subject, I'm glad to reach a 
design that is better than what we currently have. I think that 
disallowing the signed -> unsigned conversions will be a net improvement.

Andrei