Optionally strongly typed array indexes

Tue Jun 3 14:19:37 PDT 2014

This language feature is absent in D, and it's present in Ada and 
partially present in ObjectPascal. I think it's significant.

All the following ideas are preliminary (uncooked, primitive), 
they are just ideas (and quite probably there are smart people 
that can invent better things).

This is an example of wrong code, it allocates and initializes a 
matrix with three rows and two columns:

void main() {
     import std.stdio: writeln;
     auto mat = new int[][](3, 2);
     writeln(mat);

     foreach (immutable i; 0 .. 2)
         foreach (immutable j; 0 .. 3)
             mat[i][j] = i * 10 + j; // Wrong.
     writeln(mat);
}

The out of bound array access bug is found at run-time with this 
error followed by a stack trace:

core.exception.RangeError at test.d(8): Range violation

The Ada language allows you to spot that bug at compile time.

Note: in D you usually avoid that bug writing the code like this. 
But this is not enough in more complex situations:

     foreach (immutable i, row; mat)
         foreach (immutable j, ref m; row)
             m = i * 10 + j;

If you give strong types to the array indexes, the code becomes 
more self-documenting, and the compiler can catch some more of 
your mistakes. In D the associative array type syntax already 
gives a type to the indexes, so I have to tell apart the the case 
of associative array from the case of normal dynamic/fixed-size 
array with a strongly typed index. This will make the syntax 
worse.

This is a first possible syntax (it looks ugly):

void main() {
     auto M = new int[TR = @typed][TC = @typed](3, 2);

     foreach (immutable TR i; 0 .. 2)
         foreach (immutable TC j; 0 .. 3)
             M[i][j] = i * 10 + j; // Wrong.
}

@typed used inside the [] means that array has strongly typed 
(size_t) index. "TR = @typed" means that TR is the aliased name 
of such type (so you can think of it as "[alias TR = @typed]").

Now such program gives two compile time errors (type mismatch on 
the j and i indexes).

Other examples of allocation of 1D arrays with strongly typed 
indexes:

auto v1 = new int[TV = @typed](10);
int[TV = @typed 10] v2; // fixed-size.

An usage example in a simple function that performs a 2D matrix 
transposition:

T[][] transpose(T)(in T[TC = @typed][TR = @typed] m)
pure nothrow @safe {
     auto r = new T[@typed(TR)][@typed(TC)](m[0].length, m.length);
     foreach (immutable nr, const row; m)
         foreach (immutable nc, immutable c; row)
             r[nc][nr] = c;
     return r;
}

"@typed(TR)" means that for this array I am using an already 
defined index type named TR.

In theory you can also infer the type of the index with the 
template, but I am not sure how much useful this is:

void foo(TI)(int[@typed(TI)] data) {}

In that transpose function you can also see that you can assign a 
matrix with typed indexes to one without index types:

int[TI = @typed 10] a;
int[10] b = a; // OK.
a = b;         // OK.

This transpose returns a matrix with untyped indexes to simplify 
a little the use of the resulting matrix. But if you iterate with 
a foreach on such result, the index types are inferred, so it's 
not a big problem.

Probably there is also some need for a trait to get the type of 
the index of an array (it could return size_t if it's untyped):

static assert(is(__trait(index_type, a) == TI));

(In Ada you can also get the range of an array, so such interval 
types also keep some other compile-time information. But I think 
this is not essential for D, so I have not included this 
information.)

--------------------------

I have used both dynamic languages (like Python) and strongly 
typed languages (like D, Haskell). I have seen that both have 
some advantages and disadvantages. In the Haskell world lot of 
people use types to guide their coding, but when I am not using 
Haskell I prefer dynamic typing in small programs or when I 
devise a complex algorithm, and strong static typing in larger 
programs or when I already have the basic code written and I want 
to be more sure of its correctness. So I like a language like D 
that gives me the freedom to use more precise types or less 
precise types according to my current needs. Optional strong 
types for arrays are meant for situations where I want to be more 
sure of the code correctness, or in larger programs, or when the 
complexity of a data structure or the intricacy of a piece of 
code require me to put down lot of precise types to avoid losing 
control of what I am doing.

Strong index types allow you to avoid mixing by mistake index 
variables when you have more than one array, or when you have 2D 
or 3D matrices and you need to not mix rows with columns. In my D 
programming I sometimes mix the indexes, and when I am lucky I 
find the bug at runtime (but if your 2D matrix is a square it's 
less immediate to spot the bug at run-time).

Bye,
bearophile