Portability bug in integral conversion

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Sun Jan 16 19:38:55 PST 2011


On 1/16/11 9:32 PM, Graham St Jack wrote:
> On 17/01/11 13:30, Andrei Alexandrescu wrote:
>> On 1/16/11 7:51 PM, Graham St Jack wrote:
>>> On 17/01/11 10:39, Andrei Alexandrescu wrote:
>>>> On 1/16/11 5:24 PM, Graham St Jack wrote:
>>>>> On 16/01/11 08:52, Andrei Alexandrescu wrote:
>>>>>> We've spent a lot of time trying to improve the behavior of integral
>>>>>> types in D. For the most part, we succeeded, but the success was
>>>>>> partial. There was some hope with the polysemy notion, but it
>>>>>> ultimately was abandoned because it was deemed too difficult to
>>>>>> implement for its benefits, which were considered solving a minor
>>>>>> annoyance. I was sorry to see it go, and I'm glad that now its day of
>>>>>> reckoning has come.
>>>>>>
>>>>>> Some of the 32-64 portability bugs have come in the following form:
>>>>>>
>>>>>> char * p;
>>>>>> uint a, b;
>>>>>> ...
>>>>>> p += a - b;
>>>>>>
>>>>>> On 32 bits, the code works even if a < b: the difference will
>>>>>> become a
>>>>>> large unsigned number, which is then converted to a size_t (which
>>>>>> is a
>>>>>> no-op since size_t is uint) and added to p. The pointer itself is a
>>>>>> 32-bit quantity. Due to two's complement properties, the addition has
>>>>>> the same result regardless of the signedness of its operands.
>>>>>>
>>>>>> On 64-bits, the same code has different behavior. The difference a
>>>>>> - b
>>>>>> becomes a large unsigned number (say e.g. 4 billion), which is then
>>>>>> converted to a 64-bit size_t. After conversion the sign is not
>>>>>> extended - so we end up with the number 4 billion on 64-bit. That is
>>>>>> added to a 64-bit pointer yielding an incorrect value. For the
>>>>>> wraparound to work, the 32-bit uint should have been sign-extended to
>>>>>> 64 bit.
>>>>>>
>>>>>> To fix this problem, one possibility is to mark statically every
>>>>>> result of one of uint-uint, uint+int, uint-int as "non-extensible",
>>>>>> i.e. as impossible to implicitly extend to a 64-bit value. That would
>>>>>> force the user to insert a cast appropriately.
>>>>>>
>>>>>> Thoughts? Ideas?
>>>>>>
>>>>>>
>>>>>> Andrei
>>>>> It seems to me that the real problem here is that it isn't
>>>>> meaningful to
>>>>> perform (a-b) on unsigned integers when (a<b). Attempting to clean up
>>>>> the resultant mess is really papering over the problem. How about a
>>>>> runtime error instead, much like dividing by 0?
>>>>
>>>> That's too inefficient.
>>>>
>>>> Andrei
>>>
>>> If that is the case, then a static check like you are suggesting seems
>>> like a good way to go. Sure it will be annoying, but it will pick up a
>>> lot of bugs.
>>>
>>> This particular problem is one that bights me from time to time because
>>> I tend to use uints wherever it isn't meaningful to have negative
>>> values. It is great until I need to do a subtraction, when I sometimes
>>> forget to check which is greater. Would the check you have in mind
>>> statically check the following as ok?
>>>
>>> where a and b are uints and ptr is a pointer:
>>>
>>> if (a > b) {
>>> ptr += (a-b);
>>> }
>>
>> That would require flow analysis. I'm not sure we want to embark on
>> that ship. In certain situations value range propagation could take
>> care of it.
>>
>> Andrei
>>
>
> My fear is that if a cast is always required, people will just put one
> in out of habit and we are no better off (just like exception-swallowing).

I don't think it's the same. A cast's target will document the behavior. 
Right now we're simply doing silently the patently wrong thing. Walter 
stared at that code for hours. A cast would definitely be a good clue 
even if wrong.

> Is the cost of run-time checking really prohibitive?

Yes. There is no question about that. This is not negotiable.

> Correct code should
> have some checking anyway. Maybe providing phobos functions to perform
> various correct-usage operations with run-time checks like in my code
> fragment above would by useful. They could do the cast, and most of the
> annoyance factor would be dealt with. A trivial example:
>
> int difference(uint a, uint b) {
> if (a >= b) {
> return cast(int) a-b;
> }
> else {
> return -(cast(int) b-a);
> }
> }

The general approach is to define properly bounded types with 
policy-based checking.


Andrei


More information about the Digitalmars-d mailing list