How to tune numerical D? (matrix multiplication is faster in g++ vs gdc)
J
not_listed at not.not.listed
Mon Mar 4 22:42:48 PST 2013
On Monday, 4 March 2013 at 15:57:42 UTC, jerro wrote:
>> matrixMul2() takes 2.6 seconds on my machine and
>> matrixMul()takes 72 seconds (both compiled with gdmd -O
>> -inline -release -noboundscheck -mavx).
Thanks Jerro. You made me realize that help from the experts
could be quite useful. I plugged in a call to the BLAS matrix
multiply routine, which SciD conveniently binds.
The result? My 2000x2000 matrix multiply went from 98 seconds
down to 1.8 seconds. Its just hilariously faster to use 20 years
of numerical experts optimized code than to try to write your own.
// screaming fast version - uses BLAS for 50x speedup over naive
code.
//
Multipliable!(T) mmult2(T)(ref Multipliable!(T) m1,
ref Multipliable!(T) m2,
ref Multipliable!(T) m3) {
m3.array[] = 0;
assert(m1.cols == m2.rows);
char ntran = 'N';
double one = 1.0;
double zero = 0.0;
int nrow = cast(int)m1.rows;
int ncol = cast(int)m1.cols;
int mcol = cast(int)m2.cols;
scid.bindings.blas.blas.dgemm_(&ntran, // transa
&ntran, // transb
&nrow, // m
&mcol, // n
&ncol, // k
&one, // alpha
m1.array.ptr, // A
&nrow, // lda
m2.array.ptr, // B
&ncol, // ldb
&zero, // beta
m3.array.ptr, // C
&nrow, // ldc
nrow, // transa_len
ncol); // transb_len
return m3;
}
More information about the Digitalmars-d
mailing list