Mir GLAS vs Intel MKL: which is faster?
dextorious via Digitalmars-d
digitalmars-d at puremagic.com
Sat Sep 24 10:06:32 PDT 2016
First of all, awesome work. It's great to see that it's possible
to match or even exceed the performance of hand-crafted assembly
implementations with generic code.
I would suggest adding more information on how the Eigen results
were obtained. Unlike OpenBLAS, Eigen performance does often vary
by compiler and varies greatly depending on the kind of
preprocessor macros that are defined. In particular,
EIGEN_NO_DEBUG is defined by default and reduces performance,
EIGEN_FAST_MATH is not defined by default but can often increase
performance and EIGEN_STACK_ALLOCATION_LIMIT matters greatly for
performance on very small matrices (where MKL and especially
OpenBLAS are very inefficient). It's been a while since I've used
Eigen, so I may have forgotten one or two.
It may also be worth noting in the blog post that these are all
single threaded comparisons and multithreaded implementations are
on the way. This is obvious to anyone who's followed the
development of Mir, but a general audience on Reddit will likely
point it out as a deficiency unless stated upfront.
More information about the Digitalmars-d
mailing list