Improving dot product for standard multidimensional D arrays

Sun Mar 1 20:58:42 UTC 2020

Hello again,

Thanks to previous thread on multidimensional arrays, I managed 
to play around with pure D matrix representations and even 
benchmark a little against numpy:

+-------------------------------------------------------------+------------+-----------+
|                          benchmark                          | 
time (sec) |  vs Numpy |
+-------------------------------------------------------------+------------+-----------+
| Sum of two [5000, 6000] int array of arrays                 | 
~0.28      | x4.5      |
| Multiplication of two [5000, 6000] double array of arrays   | 
~0.3       | x2.6      |
| Sum of two [5000, 6000] int struct matrices                 | 
~0.039     | x0.6      |
| Multiplication of two [5000, 6000] double struct matrices   | 
~0.135     | x1.2      |
| L2 norm of [5000, 6000] double struct matrix                | 
~0.015     | x15       |
| Sort of [5000, 6000] double struct matrix (axis=-1)         | 
~2.435     | x1.9      |
| Dot product of [500, 600]&[600, 500] double struct matrices | 
~0.172     | --        |
+-------------------------------------------------------------+------------+-----------+

However, there is one benchmark I am trying to make at least a 
little comparable. That is the dot product of two struct 
matrices. Concretely [A x B] @ [B x A] = [A, A] operation.
There is a dotProduct function in std.numeric but it works with 
1D ranges only.

After it was clear that array of arrays are not very good to 
represent multidimensional data, I used struct to represent a 
multidimensional arrays like so:

******************************************************************
struct Matrix(T)
{
     T[] data; // keep our data as 1D array and reshape to 2D when 
needed
     int rows;
     int cols;
     // allow Matrix[] instead of Matrix.data[]
     alias data this;

     this(int rows, int cols)
     {
         this.data = new T[rows * cols];
         this.rows = rows;
         this.cols = cols;
     }

     this(int rows, int cols, T[] data)
     {
         assert(data.length == rows * cols);
         this.data = data;
         this.rows = rows;
         this.cols = cols;
     }

     T[][] to2D()
     {
         return this.data.chunks(this.cols).array;
     }

     /// Allow element 2D indexing, e.g. Matrix[row, col]
     T opIndex(in int r, in int c)
     {
         return this.data[toIdx(this, r, c)];
     }

}

pragma(inline) static int toIdx(T)(Matrix!T m, in int i, in int j)
{
     return m.cols * i + j;
}
******************************************************************

And here is the dot product function:

******************************************************************
Matrix!T matrixDotProduct(T)(Matrix!T m1, Matrix!T m2)
in
{
     assert(m1.rows == m2.cols);
}
do
{
     Matrix!T m3 = Matrix!T(m1.rows, m2.cols);

     for (int i; i < m1.rows; ++i)
     {
         for (int j; j < m2.cols; ++j)
         {
             for (int k; k < m2.rows; ++k)
             {
                 m3.data[toIdx(m3, i, j)] += m1[i, k] * m2[k, j];
             }
         }
     }
     return m3;
}
******************************************************************

However, attempting to run dotProduct on two 5000x6000 struct 
Matrices took ~20 min while 500x600 only 0.172 sec. And I 
wondered if there is something really wrong with the 
matrixDotProduct function.

I can see that accessing the appropriate array member in 
Matrix.data is costly due to toIdx operation but, I can hardly 
explain why it gets so much costly. Maybe there is a better way 
to do it after all?