The Matrix to end all Matrix classes (Let's dream!)

Tue Nov 20 16:51:23 PST 2007

Oskar Linde wrote:
> Don Clugston wrote:
>> No, I don't have the perfect Matrix. I just want to explore the 
>> question of whether it exists.
> 
> A very interesting topic indeed. I will just give some quick thoughts to 
> start with. First of all, a matrix is (generally) 2-dimensional, but 
> there are many uses for higher-dimensional structures as well. Also, the 
> 1 dimensional vector is a useful structure that could be handled in the 
> same generic framework. Is the target just 2-dimensional matrices or 
> also higher dimensional arrays?
> 
> The matrix as a mathematical structure is quite different in semantics 
> and use from generic multi-dimensional arrays and should perhaps be 
> treated separately.
> 
> To avoid confusion, I will use "size" to refer to the number of elements 
> of a matrix/array and "dimensionality" to refer to the number of 
> dimensions of an array (a vector is 1-dimensional, a matrix is 
> 2-dimensional).
> 
> First, there is the separation of compile time and run time. One can 
> identify (at least) 3 different needs, with different requirements for 
> run-time/compile-time dynamics:
> 
> * Small static matrices/arrays (where the size and dimensionality are 
> known at compile time)
> * Dynamic matrices/arrays (where the sizes are determined at run-time, 
> but dimensionality is fixed at compile time)
> * Fully dynamic multidimensional arrays (where the dimensionality is 
> determined at run time)
> 
> The third case could probably in most cases be left out, 

I implemented the third case in Murray (nee MultiArray)
    www.dsource.org/projects/multiarray

Mainly because I originally wanted an easy way port my NumPy code to D.
But I have to agree that after spending a lot of time on it, I would go 
with #2 if I were starting over.  Fix the number of dimensions at 
compile time.

> but the first 
> two make an interesting separation. Knowing the size at compile time is 
> mostly an advantage for smaller matrices (eg 4x4, as commonly used for 
> geometric translations). Larger matrices (eg 1000x1000) probably gain 
> much less from compile time known sizes. There is also the matter of 
> template bloat that means that compile time quantities are best left to 
> the smaller numbers.

For #1, I've never had a need for anything besides 2x2,3x3,4x4.  Maybe 
occasionally things like 2x3 to represent affine transformations, but 
then interpretation starts getting in the way.  You can't handle it like 
a generic 2x3 matrix.  So to me for the compile time size case, you can 
pretty much get by with just a fixed set of classes.  Of course if you 
can generate those efficiently from a big template, then that's great. 
But there are usually a lot of special cases.  If you look at a 
geometry-oriented linalg lib like Helix you can find many little 
differences in the API between Matrix22 and Matrix33.  They can be taken 
care of with static ifs, but the question is if the loss of readability 
it's worth it in the end.

> 
>> Requirements:
>> 1. Must be compatible with future multi-dimensional array syntax such as
>> double [7,5] m;
>> 2. Must be theoretically capable of optimal performance.
> 
> Seems reasonable.
> 
>> 3. Needs to support slices. This means we also need to support strided
>> vectors. Therefore we also need matrix slices, and matrix references.
> 
> Yes, both strided views and non-strided references are definitely needed.
> 
>> Probably distinguishing between the T[] and T[new] concepts.
> 
> Such a distinction is definitely of an advantage. For large matrices, 
> ownership of memory becomes important, as such structures may be too 
> large to be able to rely on the GC.

Sorry, I forgot what this distinction was.  Can someone remind me?

>> 4. For storage we need packed, triangular, symmetric, banded, and sparse
>> matrix support. (Possibly even 'calculated' matrix support, for
>> user-defined matrix types which generate entries on-demand).
> 
> For generality, at least sparse (and possibly calculated) are useful for 
> vectors and higher dimensional arrays too.
 >
>> 5. Can be generalized to tensors.

Numbers 4 and 5 are a bit conflicting.  There is no "triangular storage" 
for anything other than 2-D arrays.  Same goes for the commonly used 
compressed matrix formats: they are 2D-specific.

Convenient usage for linear algebra also suggests that opMul should do 
linear algebraic multiplication, but that doesn't make sense for 
tensors.  Changing opMul to do different things depending on the 
dimension seems inconsistent.

My feeling is that the generic N-D array concept needs to be separated 
from the linear algebra concepts.

However, I think all these needs can be accommodated by layering things 
appropriately.

first Storage
then  N-D array using some specific storage
then  Matrix using 2-D array, and Vector using 1-D array.

Implementations of the storage concept can be dimension-specific (like 
CCS sparse), or generic (like strided memory).

N-D array will have only operators that make sense for N-D arrays 
without reference to specific interpretations.  So opMul would be 
element-wise multiplication if anything.

Matrix and Vector can implement the linear algebraic rules.

> Yet I'm not convinced that it's impossible. What are the other
>> requirements?
> 
> Naturally, different element types (as well UDT). 

I think the UDT part in particular is difficult to do well without 
reference return values.  Even now I think with a complex matrix you 
have the problem that you can't set the real part of an element using 
opIndexAssign:
   mat[3,2].re = 5.  // oops you just set the real part of a copy!

Or for increment for any type of matrix (+= 1.0).  But that's old news.

I think you can achieve something decent using current D, but the syntax 
will not be ideal.   And doing all the above will be a very large effort.

I started collecting a list of possible libraries to draw inspiration 
from over at the Murray forum: 
http://www.dsource.org/forums/viewtopic.php?t=3307&sid=b05ae6bad8b91d51286b0096cd4ef9d2

--bb