Optionally strongly typed array indexes
bearophile via Digitalmars-d
digitalmars-d at puremagic.com
Tue Jun 3 14:19:37 PDT 2014
This language feature is absent in D, and it's present in Ada and
partially present in ObjectPascal. I think it's significant.
All the following ideas are preliminary (uncooked, primitive),
they are just ideas (and quite probably there are smart people
that can invent better things).
This is an example of wrong code, it allocates and initializes a
matrix with three rows and two columns:
void main() {
import std.stdio: writeln;
auto mat = new int[][](3, 2);
writeln(mat);
foreach (immutable i; 0 .. 2)
foreach (immutable j; 0 .. 3)
mat[i][j] = i * 10 + j; // Wrong.
writeln(mat);
}
The out of bound array access bug is found at run-time with this
error followed by a stack trace:
core.exception.RangeError at test.d(8): Range violation
The Ada language allows you to spot that bug at compile time.
Note: in D you usually avoid that bug writing the code like this.
But this is not enough in more complex situations:
foreach (immutable i, row; mat)
foreach (immutable j, ref m; row)
m = i * 10 + j;
If you give strong types to the array indexes, the code becomes
more self-documenting, and the compiler can catch some more of
your mistakes. In D the associative array type syntax already
gives a type to the indexes, so I have to tell apart the the case
of associative array from the case of normal dynamic/fixed-size
array with a strongly typed index. This will make the syntax
worse.
This is a first possible syntax (it looks ugly):
void main() {
auto M = new int[TR = @typed][TC = @typed](3, 2);
foreach (immutable TR i; 0 .. 2)
foreach (immutable TC j; 0 .. 3)
M[i][j] = i * 10 + j; // Wrong.
}
@typed used inside the [] means that array has strongly typed
(size_t) index. "TR = @typed" means that TR is the aliased name
of such type (so you can think of it as "[alias TR = @typed]").
Now such program gives two compile time errors (type mismatch on
the j and i indexes).
Other examples of allocation of 1D arrays with strongly typed
indexes:
auto v1 = new int[TV = @typed](10);
int[TV = @typed 10] v2; // fixed-size.
An usage example in a simple function that performs a 2D matrix
transposition:
T[][] transpose(T)(in T[TC = @typed][TR = @typed] m)
pure nothrow @safe {
auto r = new T[@typed(TR)][@typed(TC)](m[0].length, m.length);
foreach (immutable nr, const row; m)
foreach (immutable nc, immutable c; row)
r[nc][nr] = c;
return r;
}
"@typed(TR)" means that for this array I am using an already
defined index type named TR.
In theory you can also infer the type of the index with the
template, but I am not sure how much useful this is:
void foo(TI)(int[@typed(TI)] data) {}
In that transpose function you can also see that you can assign a
matrix with typed indexes to one without index types:
int[TI = @typed 10] a;
int[10] b = a; // OK.
a = b; // OK.
This transpose returns a matrix with untyped indexes to simplify
a little the use of the resulting matrix. But if you iterate with
a foreach on such result, the index types are inferred, so it's
not a big problem.
Probably there is also some need for a trait to get the type of
the index of an array (it could return size_t if it's untyped):
static assert(is(__trait(index_type, a) == TI));
(In Ada you can also get the range of an array, so such interval
types also keep some other compile-time information. But I think
this is not essential for D, so I have not included this
information.)
--------------------------
I have used both dynamic languages (like Python) and strongly
typed languages (like D, Haskell). I have seen that both have
some advantages and disadvantages. In the Haskell world lot of
people use types to guide their coding, but when I am not using
Haskell I prefer dynamic typing in small programs or when I
devise a complex algorithm, and strong static typing in larger
programs or when I already have the basic code written and I want
to be more sure of its correctness. So I like a language like D
that gives me the freedom to use more precise types or less
precise types according to my current needs. Optional strong
types for arrays are meant for situations where I want to be more
sure of the code correctness, or in larger programs, or when the
complexity of a data structure or the intricacy of a piece of
code require me to put down lot of precise types to avoid losing
control of what I am doing.
Strong index types allow you to avoid mixing by mistake index
variables when you have more than one array, or when you have 2D
or 3D matrices and you need to not mix rows with columns. In my D
programming I sometimes mix the indexes, and when I am lucky I
find the bug at runtime (but if your 2D matrix is a square it's
less immediate to spot the bug at run-time).
Bye,
bearophile
More information about the Digitalmars-d
mailing list