Improving dot product for standard multidimensional D arrays

Tue Mar 3 22:42:21 UTC 2020

On Sunday, 1 March 2020 at 20:58:42 UTC, p.shkadzko wrote:
> pragma(inline) static int toIdx(T)(Matrix!T m, in int i, in int 
> j)
> {
>     return m.cols * i + j;
> }

This is row-major order [1]. BTW: Why don't you make toIdx a 
member of Matrix? It saves one parameter. You may also define 
opIndex as

    ref T opIndex(in int r, in int c)

Then the innermost summation becomes more readable:

    m3[i, j] += m1[i, k] * m2[k, j];

How about performing an in-place transposition of m2 before 
performing the dot product? Then you can then rewrite the 
innermost loop:

    m3[i, j] += m1[i, k] * m2[j, k]; // note: j and k swapped

This should avoid the costly jumping thru the memory. A good 
starting point for a performance analysis would be looking over 
the assember code of the innermost loop.

[1] https://en.wikipedia.org/wiki/Row_major