Casts, overflows and demonstrations

Tue Jun 5 16:12:49 PDT 2012

This is a reduced part of some D code:

import std.bigint, std.conv, std.algorithm, std.range;

void foo(BigInt number)
in {
     assert(number >= 0);
} body {
     ubyte[] digits = text(number + 1)
                      .retro()
                      .map!(c => cast(ubyte)(c - '0'))()
                      .array();
     // ...
}

void main() {}

The important line of code adds one to 'number', converts it to a 
string, scans it starting from its end, and for each char (digit) 
finds its value, removing the ASCII value of '0', and casts the 
result to ubyte. Then converts the lazy range to an array, an 
ubyte[].

The cast in the D code is needed because 'c' is a char. If you 
remove '0' from a char, in D the result is an int, and D doesn't 
allow to assign that int (I guess the compiler performs range 
analysis on the expression, so it knows the result can be 
negative too) to an ubyte, to avoid losing information.

Casts are dangerous so it's better to avoid them where possible. 
A cast looks kind of safe because you usually know what you are 
doing while you program. But when later you change other parts of 
the code, the cast keeps being silent, and maybe it's not casting 
from the type you think it does. Maybe that kind of bugs are 
avoided by a templated function like this that makes it explicit 
both from and to types (it doesn't compile if the from type is 
wrong) (this code is not fully correct, the traits is not working 
well):

template Cast(From, To) if (__traits(compiles, 
cast(To)From.init)) {
     To Cast(T)(T x) if (is(T == From)) {
         return cast(To)x;
     }
}
void main() {
     int x = -100;
     ubyte y = Cast!(int, ubyte)(x);
     string s = "123";
     int y2 = Cast!(string, int)(s);
}

The following code is similar, but to!() performs a run-time test 
that makes it sure the subtraction result is representable inside 
an ubyte, otherwise throws an exception:

ubyte[] digits = text(number + 1)
                  .retro()
                  .map!(c => to!ubyte(c - '0'))()
                  .array();

That code is safer than the cast, but it performs a run-time test 
for each digit, this is not good.

In theory a smarter compiler (working on good enough code) is 
able to do better: text() calls a BigInt method that returns the 
textual representation of the value in base ten (today such 
method is toString(), but maybe this situation will change and 
improve). BigInt.toString() could have a post-condition like this:

string toString()
out(result) {
   size_t start = 0;
   if (this < 0) {
     assert(result[0] = '-');
     start = 1;
   }
   foreach (digit; result[start .. $])
     assert(digit >= '0' && digit <= '9');
   // If you want you can also assert that the first
   // digit is zero only if the bigint value is zero.
} body {
   // ...
}

Given that information, plus the foo pre-condition 
in{assert(number >= 0);}, a smart compiler is able to infer that 
(or asks the programmer to demonstrate that) text() returns an 
array of just ['0',..,'9'] chars, that retro() doesn't change the 
contents of the range, so if you remove '0' from them you get a 
number in [0,..,9] that is always representable in an ubyte. So 
no cast is needed.

Now and then I take a look at the ongoing development and 
refinement of the "Modern Eiffel" language (it's a kind of 
Eiffel2, see 
http://tecomp.sourceforge.net/index.php?file=doc/papers/lang/modern_eiffel.txt 
), that is supposed to be (or become able) to perform those 
inferences (or to use them if the programmer has demonstrated 
them), so I think it will be able to spare both that cast and the 
run-time tests on each char, avoiding overflow bugs.

According to Bertrand Meyer and others in 20 years similar things 
are going is going to become a part of the normal programming 
experience.

Bye,
bearophile