Treating the abusive unsigned syndrome

Thu Nov 27 14:23:12 PST 2008

Derek Parnell wrote:
> On Tue, 25 Nov 2008 09:59:01 -0600, Andrei Alexandrescu wrote:
> 
>> D pursues compatibility with C and C++ in the following manner: if a 
>> code snippet compiles in both C and D or C++ and D, then it should have 
>> the same semantics.
> 
> Interesting ... but I don't think that this should be the principle
> employed. If code is 'naughty' in C/C++ then D should not also produce the
> same results.
> 
> I would propose that a better principle to be used would be that the
> compiler will not allow loss or distortion of information without the
> coder/reader being made aware of it.

These two principle are not necessarily at odds with each other. The 
idea of being compatible with C and C++ is simple: if I paste a C 
function from somewhere into a D module, the function should either not 
compile, or compile and run with the same result. I think that's quite 
reasonable. So if the C code is behaving naughtily, D doesn't need to 
also behave naughty. It should just not compile.

>> (1) u + i, i + u
>> (2) u - i, i - u
>> (3) u - u
>> (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C 
>> requires that these all return unsigned, ouch)
>> (5) u < i, i < u, u <= i etc. (all ordering comparisons)
>> (6) -u
> 
> Note that "(3) u - u" and "(6) -u" seem to be really a use of (4), namely
> "(-1 * u)".

Correct.

> I am assming that there is no difference between 'unsigned' and 'positive',
> in so much as I am not treating 'unsigned' as 'sign unknown/irrelevant'. 
> 
> It seems to me that the issue then is not so much one of sign but of size.
> It needs an extra bit to hold the sign information thus a 32-bit unsigned
> value needs a minimum of 33 bits to convert it to a signed equivalent.
>  
> In the types (1) - (4) above, I would have the compiler compute a signed
> type for these. Then if the target of the result is a signed type AND
> larger than the 'unsigned' portion used, then the complier would not have
> to complain. In every other case the complier should complain because of
> the potential for information loss. To avoid the complaint, the coder would
> need to either change the result type, the input types or add a 'message'
> to the compliler that in effects says "I know what I'm doing, ok?" - I
> suggest a cast would suffice.
> 
> In those cases where the target type is not explicitly coded, such as using
> 'auto' or as a temporary value in an expression, the compiler should assume
> a signed type that is 'one step' larger than the 'unsigned' element in the
> expression.
> 
> e.g.
>    auto x = int * uint; ==> 'x' is long.

I don't think this will fly with Walter.

> If this causes code to be incompatible to C/C++, then it implies that the
> C/C++ code was poor (i.e. potential information loss) in the first place
> and deserves to be fixed up.

I don't quite think so. As long as the values are within range, the 
multiplication is legit and efficient.

> The scenario (5) above should also include equality comparisions, and
> should cause the compiler to issue a message AND generate code like ...
> 
>    if (u < i)  ====> if ( i < 0 ? false : u < cast(typeof(u))i)
>    if (u <= i) ====> if ( i < 0 ? false : u <= cast(typeof(u))i)
>    if (u = i)  ====> if ( i < 0 ? false : u = cast(typeof(u))i)
>    if (u >= i) ====> if ( i < 0 ? true  : u >= cast(typeof(u))i)
>    if (u > i)  ====> if ( i < 0 ? true  : u > cast(typeof(u))i)
> 
> The coder should be able to avoid the message and the suboptimal generated
> code my adding a cast ...
> 
>   if (u < cast(typeof u)i) 

Yah, comparisons need to be looked at too.

Andrei