'int' is enough for 'length' to migrate code from x86 to x64

Mon Nov 24 02:20:22 PST 2014

On Friday, 21 November 2014 at 20:17:12 UTC, Walter Bright wrote:
> On 11/21/2014 7:36 AM, Don wrote:
>> On Friday, 21 November 2014 at 04:53:38 UTC, Walter Bright 
>> wrote:
>>> 0 crossing bugs tend to show up much sooner, and often 
>>> immediately.
>>
>>
>> You're missing the point here. The problem is that people are 
>> using 'uint' as if
>> it were a positive integer type.
>>
>> Suppose  D had a type 'natint', which could hold natural 
>> numbers in the range
>> 0..uint.max.  Sounds like 'uint', right? People make the 
>> mistake of thinking
>> that is what uint is. But it is not.
>>
>> How would natint behave, in the type system?
>>
>> typeof (natint - natint)  ==  int     NOT natint  !!!
>>
>> This would of course overflow if the result is too big to fit 
>> in an int. But the
>> type would be correct.  1 - 2 == -1.
>>
>> But
>>
>> typeof (uint - uint ) == uint.
>>
>> The bit pattern is identical to the other case. But the type 
>> is wrong.
>>
>> It is for this reason that uint is not appropriate as a model 
>> for positive
>> integers. Having warnings about mixing int and uint operations 
>> in relational
>> operators is a bit misleading, because mixing signed and 
>> unsigned is not usually
>> the real problem. Instead, those warnings a symptom of a type 
>> system mistake.
>>
>> You are quite right in saying that with a signed length, 
>> overflows can still
>> occur. But, those are in principle detectable. The compiler 
>> could add runtime
>> overflow checks for them, for example. But the situation for 
>> unsigned is not
>> fixable, because it is a problem with the type system.
>>
>>
>> By making .length unsigned, we are telling people that if 
>> .length is
>> used in a subtraction expression, the type will be wrong.
>>
>> It is the incorrect use of the type system that is the 
>> underlying problem.
>
> I believe I do understand the problem. As a practical matter, 
> overflow checks are not going to be added for performance 
> reasons.

The performance overhead would be practically zero. All we would 
need to do, is restrict array slices such that the length cannot 
exceed ssize_t.max.

This can only happen in the case where the element type has a 
size of 1, and only in the case of slicing a pointer, 
concatenation, and memory allocation.

Making this restriction would have been unreasonable in the 8 and 
16 bit days, but D doesn't support those.  For 32 bits, this is 
an extreme corner case. For 64 bit, this condition never happens 
at all.

In exchange, 99% of uses of unsigned would disappear from D code, 
and with it, a whole category of bugs.

> Also, in principle, uint-uint can generate a runtime check for 
> underflow (i.e. the carry flag).

No it cannot. The compiler does not have enough information to 
know if the value is intended to be positive integer, or an 
unsigned. That information is lost from the type system.

Eg from C, wrapping of an unsigned type is not an error. It is 
perfectly defined behaviour. With signed types, it's undefined 
behaviour.

To make this clear: I am not proposing that size_t should be 
changed.
I am proposing that for .length returns a signed type, that for 
array slices is guaranteed to never be negative.