BLADE 0.2Alpha: Vector operations with mixins, expression templates, and asm
Don Clugston
dac at nospam.com.au
Tue Apr 3 11:17:35 PDT 2007
I have been trying to come up with a convincing use case for the new
mixins (and for metaprogramming in general). My best effort to date can
be found at:
http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d
It generates near-optimal x87 asm code for BLAS1-style basic vector
operations. 32, 64 and 80 bit vectors are all supported.
Compile with -version=BladeDebug to see the asm code which is generated.
Typical usage:
void main()
{
auto p = Vec([1.0L, 2, 18]); // a vector of 80-bit reals.
auto q = Vec([3.5L, 1.1, 3.8]); // ditto
auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
real d = dot(r, p+r+r);
ireal e = dot(r, z);
q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
d = dot(r, p+r+r);
}
Notice that mixed-length operations (real[] + float[] - double[]) are
supported.
Like the C++ Blitz++ library, expression templates are used to convert
vector expressions into efficient element-wise operations. Unlike that
library, however, there is no reliance on the compiler's optimiser.
Instead, the expression template is manipulated as text, converted into
postfix, and then passed to a simple CTFE compile-time assembler, which
creates highly efficient asm code which is used as a mixin.
To understand the later parts of the code, you need some knowledge of
x87 assembler. In fact, you probably need to have read Agner Fog's
superb Pentium optimisation manual (www.agner.org).
Some observations:
* I was amazed at how simple the expression template code is (it is
somewhat cluttered by the code to check for real/imaginary type mismatch
errors).
* I've often read that the x87 floating-point stack is notoriously
difficult for compilers to write code for, but it works quite well in
this case.
* The major workarounds are:
- inability to use a tuple element directly from asm code (bug #1028);
- inability to define operators for built-in arrays (hence the use of
'Vec' wrappers).
- inability to index through a tuple in a CTFE function (solved by
converting types into a string).
* There have been mutterings about how unhygenic/dangerous the new
mixins are. In this case, the mixin forms the _entire_ body of the
function. This is an interesting situation which I think a language
purist will find more palatable.
Enjoy.
More information about the Digitalmars-d
mailing list