A Value Range Propagation usage example, and more

Wed Oct 8 05:17:35 PDT 2014

This is the first part of a function to convert to base 58 (some 
letters are missing, like the upper case "I") used in the Bitcoin 
protocol:

alias Address = ubyte[1 + 4 + RIPEMD160_digest_len];

char[] toBase58(ref Address a) pure nothrow @safe {
     static immutable symbols = "123456789" ~
                                "ABCDEFGHJKLMNPQRSTUVWXYZ" ~
                                "abcdefghijkmnopqrstuvwxyz";
     static assert(symbols.length == 58);

     auto result = new typeof(return)(34);
     foreach_reverse (ref ri; result) {
         uint c = 0;
         foreach (ref ai; a) {
             c = c * 256 + ai;
             ai = cast(ubyte)(c / symbols.length);
             c %= symbols.length;
         }
         ri = symbols[c];
     }
     ...
}

The D type system isn't smart enough to see that "ai" is always 
fitting in an ubyte, so I have had to use a cast(ubyte). But 
casts are dangerous and their usage should be minimized, and 
to!ubyte is slow and makes the function not nothrow. So I've 
rewritten the code like this with a bit of algebraic rewriting:

char[] toBase58(ref Address a) pure nothrow @safe {
     static immutable symbols = "123456789" ~
                                "ABCDEFGHJKLMNPQRSTUVWXYZ" ~
                                "abcdefghijkmnopqrstuvwxyz";
     static assert(symbols.length == 58);

     auto result = new typeof(return)(34);
     foreach_reverse (ref ri; result) {
         uint c = 0;
         foreach (ref ai; a) {
             immutable d = (c % symbols.length) * 256 + ai;
             ai = d / symbols.length;
             c = d;
         }
         ri = symbols[c % symbols.length];
     }
     ...
}

Now it can be a little slower because the integer division and 
modulus has different divisors, so perhaps they can't be 
implemented with a little more than a single division, as before 
(I have not compared the assembly), but for the purposes of this 
code the performance difference is not a problem. Now the D type 
system is able to see that "ai" is always fitting in a ubyte, and 
there's no need for a cast. The compiler puts a safe implicit 
cast. This is awesome.

- - - - - - - - - - - - - -

But of course you often want more :-)

This is another case where the current D type system allows you 
to avoid a cast:

void main() {
     char['z' - 'a' + 1] arr;

     foreach (immutable i, ref c; arr)
         c = 'a' + i;
}

But if you want to use ranges and functional UFCS chains you 
currently need the cast:

void main() {
     import std.range, std.algorithm, std.array;

     char[26] arr = 26
                    .iota
                    .map!(i => cast(char)('a' + i))
                    .array;
}

In theory this program has the same compile-time information as 
the foreach case. In practice foreach is a built-in that enjoys 
more semantics than a iota+map.

Currently iota(26) loses the compile-time information about the 
range, so you can't do (note: the "max" attribute doesn't exists):

void main() {
     import std.range: iota;
     auto r = iota(26);
     enum ubyte m = r.max;
}

Currently the only way to keep that compile-time information is 
to use a template argument:

void main() {
     import std.range: tIota;
     auto r = tIota!26;
     enum ubyte m = r.max; // OK
}

But even if you write such tIota range, the map!() will lose the 
compile-time value range information. And even if you manage to 
write a map!() able to do it with template arguments, you have 
template bloat.

So there's a desire to manage the compile-time information (like 
value range information) of (number) literals without causing 
template bloat and without the need to use explicit template 
arguments.

Bye,
bearophile