How to tune numerical D? (matrix multiplication is faster in g++ vs gdc)

J not_listed at not.not.listed
Mon Mar 4 22:42:48 PST 2013


On Monday, 4 March 2013 at 15:57:42 UTC, jerro wrote:
>> matrixMul2() takes 2.6 seconds on my machine and 
>> matrixMul()takes 72 seconds (both compiled with  gdmd -O 
>> -inline -release -noboundscheck -mavx).

Thanks Jerro. You made me realize that help from the experts 
could be quite useful.  I plugged in a call to the BLAS matrix 
multiply routine, which SciD conveniently binds.

The result?  My 2000x2000 matrix multiply went from 98 seconds 
down to 1.8 seconds.  Its just hilariously faster to use 20 years 
of numerical experts optimized code than to try to write your own.


// screaming fast version - uses BLAS for 50x speedup over naive 
code.
//
Multipliable!(T) mmult2(T)(ref Multipliable!(T) m1,
                          ref Multipliable!(T) m2,
                          ref Multipliable!(T) m3) {
     m3.array[] = 0;

     assert(m1.cols == m2.rows);

     char ntran = 'N';
     double one = 1.0;
     double zero = 0.0;
     int nrow = cast(int)m1.rows;
     int ncol = cast(int)m1.cols;
     int mcol = cast(int)m2.cols;

     scid.bindings.blas.blas.dgemm_(&ntran, // transa
                                    &ntran, // transb
				   &nrow,  // m
				   &mcol,  // n
				   &ncol,  // k
				   &one,   // alpha
				   m1.array.ptr, // A
                                    &nrow,        // lda
                                    m2.array.ptr, // B
                                    &ncol,        // ldb
                                    &zero,        // beta
                                    m3.array.ptr, // C
                                    &nrow,        // ldc
                                    nrow,         // transa_len
                                    ncol);        // transb_len
     return m3;
}



More information about the Digitalmars-d mailing list