Treating the abusive unsigned syndrome

Wed Nov 26 10:30:30 PST 2008

Don wrote:
> Michel Fortin wrote:
>> On 2008-11-26 10:24:17 -0500, Andrei Alexandrescu 
>> <SeeWebsiteForEmail at erdani.org> said:
>>
>>> Also consider:
>>>
>>> auto delta = a1.length - a2.length;
>>>
>>> What should the type of delta be? Well, it depends. In my scheme that 
>>> wouldn't even compile, which I think is a good thing; you must decide 
>>> whether prior information makes it an unsigned or a signed integral.
>>
>> In my scheme it would give you a uint. You'd have to cast to get a 
>> signed integer... I see how it's not ideal, but I can't imagine how it 
>> could be coherent otherwise.
>>
>>     auto diff = cast(int)a1.length - cast(int)a2.length;
> 
> Actually, there's no solution.

There is. We need to find the block of marble it's in and then chip the 
extra marble off it.

> Imagine a 32 bit system, where one object can be greater than 2GB in 
> size (not possible in Windows AFAIK, but theoretically possible).

It is possible in Windows if you change some I-forgot-which parameter in 
boot.ini.

> Then 
> if a1 is 3GB, delta cannot be stored in an int. If a2 is 3GB, it 
> requires an int for storage, since result is less than 0.
> 
> ==> I think length has to be an int. It's less bad than uint.

I'm not sure how the conclusion follows from the premises, but consider 
this. If someone deals with large arrays, they do have the possibility 
of doing things like:

if (a1.length >= a2.length) {
     size_t delta = a1.length - a2.length;
     ... use delta ...
} else {
     size_t rDelta = a2.length - a1.length;
     ... use rDelta ...
}

I'm not saying it's better than sliced bread, but it is a solution. And 
it is correct on all systems. And cooperates with the typechecker by 
adding flow information to which typecheckers are usually oblivious. And 
types are out in the clear. And it's the programmer, not the compiler, 
who decides the signedness.

In contrast, using ints for array lengths beyond 2GB is a nightmare. I'm 
not saying it's a frequent thing though, but since you woke up the 
sleeping dog, I'm just barking :o).

>> Perhaps we could add a "sign" property to uint and an "unsign" 
>> property to int that'd give you the signed or unsigned corresponding 
>> value and which could do range checking at runtime (enabled by a 
>> compiler flag).
>>
>>     auto diff = a1.length.sign - a2.length.sign;
>>
>> And for the general problem of "uint - uint" giving a result below 
>> uint.min, as I said in my other post, that could be handled by a 
>> runtime check (enabled by a compiler flag) just like array bound 
>> checking.
> 
> That's not bad.

Well let's look closer at this. Consider a system in which the current 
rules are in vigor, plus the overflow check for uint.

auto i = arr.length - offset1 + offset2;

Although the context makes it clear that offset1 < offset2 and therefore 
i is within range and won't overflow, the poor code generator has no 
choice but insert checks throughout. Even though the entire expression 
is always correct, it will dynamically fail on the way to its correct form.

Contrast with the proposed system in which the expression will not 
compile. They will indeed require the user to somewhat redundantly 
insert guides for operations, but during compilation, not through 
runtime failure.

>>>
>>> Fine. With constants there is some mileage that can be squeezed. But 
>>> let's keep in mind that that doesn't solve the larger issue.
>>
>> Well, by making implicit convertions between uint and int illegal, 
>> we're solving the larger issue. Just not in a seemless manner.
> 
> We are of one mind. I think that constants are the root cause of the 
> problem.

Well I strongly disagree. (I assume you mean "literals", not 
"constants".) I see constants as just a small part of the signedness 
mess. Moreover, I consider that in fact creating symbolic names with 
"auto" compounds the problem, and this belief runs straight against 
yours that it's about literals. No, IMHO it's about espousing and then 
propagating wrong beliefs through auto!

Maybe if you walked me through your reasoning on why literals bear a 
significant importance I could get convinced. As far as my code is 
concerned, I tend to loosely go along the lines of the old adage "the 
only literals in a program should be 0, 1, and -1". True, the adage 
doesn't say how many of these three may reasonably occur, but at the end 
of the day I'm confused about this alleged importance of literals.

Andrei