How to tune numerical D? (matrix multiplication is faster in g++ vs gdc)

Tue Mar 5 07:01:26 PST 2013

On 03/04/2013 04:46 PM, jerro wrote:
>> A bit better version:
>> http://codepad.org/jhbYxEgU
>>
>> I think this code is good compared to the original (there are better
>> algorithms).
>
> You can make it much faster even without really changing the algorithm.
> Just by reversing the order of inner two loops like this:
>
> void matrixMult2(in int[][] m1, in int[][] m2, int[][] m3) pure nothrow {
>      foreach (immutable i; 0 .. m1.length)
>          foreach (immutable k; 0 .. m2[0].length)
>              foreach (immutable j; 0 .. m3[0].length)
>                  m3[i][j] += m1[i][k] * m2[k][j];
> }
>
> you can make the code much more cache friendly (because now you aren't
> iterating any matrix by column in the inner loop) and also allow the
> compiler to do auto vectorization. matrixMul2() takes 2.6 seconds on my
> machine and matrixMul()takes 72 seconds (both compiled with  gdmd -O
> -inline -release -noboundscheck -mavx).
>
> This isn't really relevant to the comparison with C++ in this thread, I
> just thought it may be useful for anyone writing matrix code.

This is a little faster for 2000x2000 matrices and typical cache size:

void matrixMult(in int[][] m1, in int[][] m2, int[][] m3){
     enum block=50; // optimal parameter depends on hardware
     foreach(ib;iota(0,m1.length,block))
     foreach(kb;iota(0,m2[0].length,block))
     foreach(i;iota(ib,min(m1.length,ib+block)))
     foreach(k;iota(kb,min(m2[0].length,kb+block)))
         m3[i][] += m1[i][k] * m2[k][];
}