Mir GLAS vs Intel MKL: which is faster?

Sat Sep 24 10:06:32 PDT 2016

First of all, awesome work. It's great to see that it's possible 
to match or even exceed the performance of hand-crafted assembly 
implementations with generic code.

I would suggest adding more information on how the Eigen results 
were obtained. Unlike OpenBLAS, Eigen performance does often vary 
by compiler and varies greatly depending on the kind of 
preprocessor macros that are defined. In particular, 
EIGEN_NO_DEBUG is defined by default and reduces performance, 
EIGEN_FAST_MATH is not defined by default but can often increase 
performance and EIGEN_STACK_ALLOCATION_LIMIT matters greatly for 
performance on very small matrices (where MKL and especially 
OpenBLAS are very inefficient). It's been a while since I've used 
Eigen, so I may have forgotten one or two.

It may also be worth noting in the blog post that these are all 
single threaded comparisons and multithreaded implementations are 
on the way. This is obvious to anyone who's followed the 
development of Mir, but a general audience on Reddit will likely 
point it out as a deficiency unless stated upfront.