Treating the abusive unsigned syndrome

Derek Parnell derek at psych.ward
Thu Nov 27 14:14:14 PST 2008


On Tue, 25 Nov 2008 09:59:01 -0600, Andrei Alexandrescu wrote:

> D pursues compatibility with C and C++ in the following manner: if a 
> code snippet compiles in both C and D or C++ and D, then it should have 
> the same semantics.

Interesting ... but I don't think that this should be the principle
employed. If code is 'naughty' in C/C++ then D should not also produce the
same results.

I would propose that a better principle to be used would be that the
compiler will not allow loss or distortion of information without the
coder/reader being made aware of it.

 
> (1) u + i, i + u
> (2) u - i, i - u
> (3) u - u
> (4) u * i, i * u, u / i, i / u, u % i, i % u (compatibility with C 
> requires that these all return unsigned, ouch)
> (5) u < i, i < u, u <= i etc. (all ordering comparisons)
> (6) -u

Note that "(3) u - u" and "(6) -u" seem to be really a use of (4), namely
"(-1 * u)".

I am assming that there is no difference between 'unsigned' and 'positive',
in so much as I am not treating 'unsigned' as 'sign unknown/irrelevant'. 

It seems to me that the issue then is not so much one of sign but of size.
It needs an extra bit to hold the sign information thus a 32-bit unsigned
value needs a minimum of 33 bits to convert it to a signed equivalent.
 
In the types (1) - (4) above, I would have the compiler compute a signed
type for these. Then if the target of the result is a signed type AND
larger than the 'unsigned' portion used, then the complier would not have
to complain. In every other case the complier should complain because of
the potential for information loss. To avoid the complaint, the coder would
need to either change the result type, the input types or add a 'message'
to the compliler that in effects says "I know what I'm doing, ok?" - I
suggest a cast would suffice.

In those cases where the target type is not explicitly coded, such as using
'auto' or as a temporary value in an expression, the compiler should assume
a signed type that is 'one step' larger than the 'unsigned' element in the
expression.

e.g.
   auto x = int * uint; ==> 'x' is long.

If this causes code to be incompatible to C/C++, then it implies that the
C/C++ code was poor (i.e. potential information loss) in the first place
and deserves to be fixed up.

The scenario (5) above should also include equality comparisions, and
should cause the compiler to issue a message AND generate code like ...

   if (u < i)  ====> if ( i < 0 ? false : u < cast(typeof(u))i)
   if (u <= i) ====> if ( i < 0 ? false : u <= cast(typeof(u))i)
   if (u = i)  ====> if ( i < 0 ? false : u = cast(typeof(u))i)
   if (u >= i) ====> if ( i < 0 ? true  : u >= cast(typeof(u))i)
   if (u > i)  ====> if ( i < 0 ? true  : u > cast(typeof(u))i)

The coder should be able to avoid the message and the suboptimal generated
code my adding a cast ...

  if (u < cast(typeof u)i) 

I am also assuming that syntax 'cast(unsigned-type)signed-type' is telling
the complier to assume that the bits in the signed-value already represent
a valid unsigned-value and so therefore the compiler should not generate
code to 'transform' the signed-value bits to form an unsigned-value.


To summarize, 
(1) Perpetuating poor quality C/C++ code should not be encouraged. 
(2) The compiler should help the coder be aware of potential information
loss.
(3) The coder should have mechanisms to override the compiler's concerns.

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell



More information about the Digitalmars-d mailing list