'int' is enough for 'length' to migrate code from x86 to x64

Don via Digitalmars-d digitalmars-d at puremagic.com
Tue Nov 25 00:47:35 PST 2014


On Monday, 24 November 2014 at 21:34:19 UTC, Walter Bright wrote:
> On 11/24/2014 2:20 AM, Don wrote:
>>> I believe I do understand the problem. As a practical matter, 
>>> overflow checks
>>> are not going to be added for performance reasons.
>>
>> The performance overhead would be practically zero. All we 
>> would need to do, is
>> restrict array slices such that the length cannot exceed 
>> ssize_t.max.
>>
>> This can only happen in the case where the element type has a 
>> size of 1, and
>> only in the case of slicing a pointer, concatenation, and 
>> memory allocation.
>
> (length1 + length2) / 2

That's not an issue with length, that's an issue with doing a 
calculation with an insufficient bit width. Unsigned doesn't 
actually help, it's still wrong.

For unsigned values, if length1 = length2 = 0x8000_0000, that 
gives an answer of 0.


>> In exchange, 99% of uses of unsigned would disappear from D 
>> code, and with it, a
>> whole category of bugs.
>
> You're not proposing changing size_t, so I believe this 
> statement is incorrect.

 From the D code that I've seen, almost all uses of size_t come 
directly from the use of .length. But I concede (see below) that 
many of them come from .sizeof.

>>> Also, in principle, uint-uint can generate a runtime check 
>>> for underflow (i.e.
>>> the carry flag).
>>
>> No it cannot. The compiler does not have enough information to 
>> know if the value
>> is intended to be positive integer, or an unsigned. That 
>> information is lost
>> from the type system.
>>
>> Eg from C, wrapping of an unsigned type is not an error. It is 
>> perfectly defined
>> behaviour. With signed types, it's undefined behaviour.
>
> I know it's not an error. It can be defined to be an error, and 
> the compiler can insert a runtime check. (I'm not proposing 
> this, just saying it can be done.)

But it can't do that, without turning unsigned into a different 
type.
You'd be turning unsigned into a 'non-negative' which is a 
completely different type. This is my whole point.

unsigned has no sign, you just get the raw bit pattern with no 
interpretation.
This can mean several things, for example:
1. extended_non_negative is where you are using it for the 
positive range 0.. +0xFFFF_FFFF
   Then, overflow and underflow are errors.
2. a value where the highest bit is always 0. This can be safely 
used as int or uint.
3. Or, it can be modulo 2^^32 arithmetic, where wrapping is 
intended.
4. It can be part of extended precision arithmetic, where you 
want the carry flag.
5. It can be just a raw bit pattern.
6. The high bit can be a sign bit. This is a signed type, cast to 
uint.
If the sign bit ever flips because of a carry, that's an error.

The type system doesn't specify a meaning for the bit pattern. 
We've got a special type for case 6, but not for the others.

The problem with unsigned is that since it can mean so many 
things, as if it were a union of these possibilities. So it's not 
strictly typed -- you need to careful, requiring some element of 
faith-based programming.

And "signed-unsigned mismatch" is really where you are implicitly 
assuming that the unsigned value is case 2 or 6.  But, if it is 
one of the other cases, you get nonsense.

But those "signed unsigned mismatch" errors only catch some of 
the possible cases where you may forget which interpretation you 
are using, and act as if it were another one.


>> To make this clear: I am not proposing that size_t should be 
>> changed.
>> I am proposing that for .length returns a signed type, that 
>> for array slices is
>> guaranteed to never be negative.
>
> There'll be mass confusion if .length is not the same type as 
> .sizeof

Ah, that is a good point. .sizeof is another source of unsigned.
Again, quite unnecessarily, can a single type ever actually use 
up half of the memory space? (It was possible in the 8 and 16 bit 
days, but it's hard to imagine today). Even sillier, it is nearly 
always known at compile time!

But still, .sizeof is low-level in a way that .length is not.


More information about the Digitalmars-d mailing list