Designing a matrix library for D

Mon Jun 23 09:45:27 PDT 2014

I'd like a matrix library for D, able to handle sparse matrices
too. But first I'd like to discuss its API and general design a
bit.

---------------------------

I think a starting point for the design is visible in this paper,
"Optimizing Array Accesses in High Productivity Languages" by
Mackale Joyner et al.:
http://www.cs.rice.edu/~vs3/PDF/hpcc07-176.pdf

The paper is about optimization, but what's interesting for me is
that it also shows the API and usage of the matrices of the X10
language, that was designed for heavy numerical computations.

Those collections of X10 are based on "points" and "regions".

A Point is a vector of N keys, that are indexes of a vector (like
the row,col for a matrix, or the key for an associative array).
If N is a compile-time constant we lose some flexibility but we
gain efficiency. In theory we want both.

A Region defines part of a collection. In the simpler case it's a
slice of a 1D array, or it can be a 2D rectangular region of a 2D
array, or more. So it's an array of M points. Usually you want M
to be a compile-time constant, but in some cases you want more
flexibility. In some cases you want to intersect regions, or you
want non-convex regions, so the situation gets a little more
complex.

The paper lists some Region operations, that I think can be
implemented with no changes to the D language:

R.rank ::= # dimensions in region;
R.size() ::= # points in region
R.contains(P) ::= predicate if region R contains point P
R.contains(S) ::= predicate if region R contains region S
R.equal(S) ::= true if region R and S contain same set of points
R.rank(i) ::= projection of region R on dimension i (a
one-dimensional region)
R.rank(i).low() ::= lower bound of i-th dimension of region R
R.rank(i).high() ::= upper bound of i-th dimension of region R
R.ordinal(P) ::= ordinal value of point P in region R
R.coord(N) ::= point in region R with ordinal value = N
R1 && R2 ::= region intersection (will be rectangular if R1 and
R2 are rectangular)
R1 || R2 ::= union of regions R1 and R2 (may or may not be
rectangular,compact)
R1 - R2 ::= region difference (may or may not be
rectangular,compact)

The paper also shows some ideas for array operations:

A.rank ::= # dimensions in array
A.region ::= index region (domain) of array
A.distribution ::= distribution of array A
A[P] ::= element at point P, where P belongs to A.region
A | R ::= restriction of array onto region R (returns copy of
subarray)
A.sum(), A.max() ::= sum/max of elements in array
A1 <op> A2 ::= returns result of applying a point-wise op on
array elements,
when A1.region = A2. region
(<op> can include +, -, *, and / )
A1 || A2 ::= disjoint union of arrays A1 and A2
(A1.region and A2.region must be disjoint)
A1.overlay(A2) ::= array with region, A1.region || A2.region,
    with element value A2[P] for all points P in A2.region
    and A1[P] otherwise.

---------------------------

I've found a second source of ideas in this page:
http://dis.um.es/~alberto/hmatrix/static.html

It's a Haskell matrix library named "hmatrix". It's meant to be
"safe" because it performs "automatic inference and compile-time
checking of dimensions in matrix and vector operations.". This
means that in many cases arrays and matrices have a size known at
compile-time (so it's part of the type) and the library verifies
statically that they match:

(matrix
   [ 2.0, 0.0, -1.0
   , 1.0, 1.0,  7.0
   , 5.0, 3.0,  1.0
   , 2.0, 8.0,  0.0 ] :: L 4 3)

But:

>The size of the result of certain computations can only be known 
>at run time. For instance, the dimension of the nullspace of 
>matrix depends on its rank, which is a nontrivial property of 
>its elements:<

The library solves this problem using a GHC compiler language
extension:

>By hiding the unknown dimension in an existential type we can 
>still compute safely with a strongly typed nullspace.<

Docs about Haskell existential types:

http://www.haskell.org/haskellwiki/Existential_type

This is a normal Haskell record with specified types:

data Worker b x y = Worker {buffer :: b, input :: x, output :: y}

That is similar to this D code:

class Worker(B, X, Y) {
      B buffer;
      X input;
      Y output;
}

Existential types can be used to "hide" a type on the left:

data Worker x y = forall b. Buffer b => Worker {buffer :: b,
input :: x, output :: y}

This allows to "hide" the size of an array where its length is
not known at compile-time. I am not an Haskell expert, but I
think this is a kind of type erasure (where here the type is the
size of the array).

With this strategy in most cases the compiler is able to enforce
the arrays are of right types at compile-time, and allows
run-time sizes for the few remaining cases.

In D this idea should avoid template bloat caused by all the
different array dimensions, but allow optimizations based on the
compile-time knowledge of sizes in the points where the
programmer asks for more efficiency (like unrolling on request in
certain zones if the size is just 4).

Is all this possible with the type system of D? Perhaps it's
possible with the enum preconditions or some of the stuff
discussed in the "Value range propagation for if-else" thread.
Improving the D type system could be an option, if useful.

Bye,
bearophile