The Matrix to end all Matrix classes (Let's dream!)

Wed Nov 21 03:01:35 PST 2007

Bill Baxter wrote:
> TomD wrote:
>> Bill Baxter Wrote:
>> [...]
>>>> The third case could probably in most cases be left out, 
>>> I implemented the third case in Murray (nee MultiArray)
>>>     www.dsource.org/projects/multiarray
>>>
>>> Mainly because I originally wanted an easy way port my NumPy code to D.
>>> But I have to agree that after spending a lot of time on it, I would 
>>> go with #2 if I were starting over.  Fix the number of dimensions at 
>>> compile time.
>>>
>>>> but the first two make an interesting separation. Knowing the size 
>>>> at compile time is mostly an advantage for smaller matrices (eg 4x4, 
>>>> as commonly used for geometric translations). Larger matrices (eg 
>>>> 1000x1000) probably gain much less from compile time known sizes. 
>>>> There is also the matter of template bloat that means that compile 
>>>> time quantities are best left to the smaller numbers.
>>> For #1, I've never had a need for anything besides 2x2,3x3,4x4.  
>>> Maybe occasionally things like 2x3 to represent affine 
>>> transformations, but then interpretation starts getting in the way.  
>>> You can't handle it like a generic 2x3 matrix.  So to me for the 
>>> compile time size case, you can pretty much get by with just a fixed 
>>> set of classes.  Of course if you can generate those efficiently from 
>>> a big template, then that's great. But there are usually a lot of 
>>> special cases.  If you look at a geometry-oriented linalg lib like 
>>> Helix you can find many little differences in the API between 
>>> Matrix22 and Matrix33.  They can be taken care of with static ifs, 
>>> but the question is if the loss of readability it's worth it in the end.
>>
>> In 2D Finite Elements some other fixed size small arrays are common, 
>> 6x6, 9x9, depending on the type of elements you use, also some 
>> non-square ones.
> 
> And wouldn't you know it.  Right after I wrote that I sat down and 
> started working on some code where I really needed 2x3 matrices.  Doh! I 
> guess I just never noticed needing them before.
> 
>> [...]
>>>>> 4. For storage we need packed, triangular, symmetric, banded, and 
>>>>> sparse
>>>>> matrix support. (Possibly even 'calculated' matrix support, for
>>>>> user-defined matrix types which generate entries on-demand).
>>>> For generality, at least sparse (and possibly calculated) are useful 
>>>> for vectors and higher dimensional arrays too.
>>>  >
>>>>> 5. Can be generalized to tensors.
>>> Numbers 4 and 5 are a bit conflicting.  There is no "triangular 
>>> storage" for anything other than 2-D arrays.  Same goes for the 
>>> commonly used compressed matrix formats: they are 2D-specific.
>>
>> Well, that becomes something like "pyramidal" structure in 3D.
>> They are quite common in 3D partial differential equations, like
>> Laplace solvers, or in Fluid Mechanics.
>>
>>> Convenient usage for linear algebra also suggests that opMul should 
>>> do linear algebraic multiplication, but that doesn't make sense for 
>>> tensors.  Changing opMul to do different things depending on the 
>>> dimension seems inconsistent.
>>
>> For Tensors we'd need something like ., :, and x, dot product, double dot
>> product and cross product. It may seem inconsistent, but it is math :-)
> 
> So the thing to do seems to be to have different wrappers for the 
> underlying storage methods that expose different operations.
> 
> Or just different sets of functions that treat the storage in different 
> ways.  linalg.dot, tensor.dot, ...
> 
> It's quite a large software engineering problem.  But I'm looking 
> forward to the day when Don is done writing it.  :-)

Me too :-) Quite possibly it's all too hard.

Yet it seems to me that fundamentally, the storage class just needs to expose 
how to get from one element to another efficiently.

eg for a matrix, expose some compile-time traits
m.packingType   --- row, column, transposable(at runtime, will always be either 
row or column, please generate code for row & column and choose between them at 
runtime), none (no optimisation potential).

and then if row-based:

m.firstElementInRow(y)
m.lastElementInRow(y)
m.rowstride

and then optimal access is something like:

for (int i=0; i<m.numColumns; ++i) {
   for (auto t= &m[i][firstElementInRow(i)];
	t<=&m[i][lastElementInRow(i)]; t+=m.rowStride) {
	// *t  is m[i][j]
   }
}

It all gets very complicated, but the saving grace is that it only needs to be 
optimal for the common cases.