Treating the abusive unsigned syndrome

Fri Nov 28 08:09:25 PST 2008

Andrei Alexandrescu wrote:
> (I lost track of quotes, so I yanked them all beyond Don's message.)
> 
> Don wrote:
>> The problem with that, is that you're then forcing the 'unsigned is a 
>> natural' interpretation when it may be erroneous.
>>
>> uint.max - 10 is a uint.
>>
>> It's an interesting case, because int = u1 - u2 is definitely 
>> incorrect when u1 > int.max.
>>
>> uint = u1 - u2 may be incorrect when u1 < u2, _if you think of 
>> unsigned as a positive number_.
>> But, if you think of it as a natural modulo 2^32, uint = u1-u2 is 
>> always correct, since that's what's happening mathematically.
> 
> Sounds good. One important consideration is that modulo arithmetic is 
> considerably easier to understand when two's complement and signs are 
> not involved.
> 
>> I'm strongly of the opinion that you shouldn't be able to generate an 
>> unsigned accidentally -- you should need to either declare a type as 
>> uint, or use the 'u' suffix on a literal.
>> Right now, properties like 'length' being uint means you get too many 
>> surprising uints, especially when using 'auto'.
> 
> I am not surprised by length being unsigned. I'm also not surprised by 
> hexadecimal constants being unsigned. (They are unsigned in C. Walter 
> made them signed or not, depending on their value.)
> 
>> I take your point about not wanting to give up the full 32 bits of 
>> address space. The problem is, that if you have an object x which is 
>>  >2GB, and a small object y, then  x.length - y.length will 
>> erroneously be negative. If we want code (especially in libraries) to 
>> cope with such large objects, we need to ensure that any time there's 
>> a subtraction involving a length, the first is larger than the second. 
>> I think that would preclude the combination:
>>
>> length is uint
>> byte[].length can exceed 2GB, and code is correct when it does
>> uint - uint is an int (or even, can implicitly convert to int)
>>
>> As far as I can tell, at least one of these has to go.
> 
> Well none has to go in the latest design:
> 
> (a) One unsigned makes everything unsigned
> 
> (b) unsigned -> signed is allowed
> 
> (c) signed -> unsigned is disallowed
> 
> Of course the latest design has imperfections, but precludes neither of 
> the three things you mention.

It's close, but how can code such as:

if (x.length - y.length < 100) ...

be correct in the presence of length > 2GB?

since
(a) x.length  = uint.max, y.length = 1
(b) x.length = 4, y.length = 2
both produce the same binary result (0xFFFF_FFFE = -2)

Any subtraction of two lengths has a possible range of
  -int.max .. uint.max
which is quite problematic (and the root cause of the problems, I guess).
And unfortunately I think code is riddled with subtraction of lengths.