BLADE 0.2Alpha: Vector operations with mixins, expression templates, and asm

Don Clugston dac at nospam.com.au
Tue Apr 3 11:17:35 PDT 2007


I have been trying to come up with a convincing use case for the new 
mixins (and for metaprogramming in general). My best effort to date can 
be found at:
http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d

It generates near-optimal x87 asm code for BLAS1-style basic vector 
operations. 32, 64 and 80 bit vectors are all supported.

Compile with -version=BladeDebug to see the asm code which is generated.

Typical usage:

void main()
{
     auto p = Vec([1.0L, 2, 18]);    // a vector of 80-bit reals.
     auto q = Vec([3.5L, 1.1, 3.8]);  // ditto
     auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
     auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
     real d = dot(r, p+r+r);
     ireal e = dot(r, z);
     q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
     d = dot(r, p+r+r);
}

Notice that mixed-length operations (real[] + float[] - double[]) are 
supported.

Like the C++ Blitz++ library, expression templates are used to convert 
vector expressions into efficient element-wise operations. Unlike that 
library, however, there is no reliance on the compiler's optimiser. 
Instead, the expression template is manipulated as text, converted into 
postfix, and then passed to a simple CTFE compile-time assembler, which 
creates highly efficient asm code which is used as a mixin.
To understand the later parts of the code, you need some knowledge of 
x87 assembler. In fact, you probably need to have read Agner Fog's 
superb Pentium optimisation manual (www.agner.org).

Some observations:
* I was amazed at how simple the expression template code is (it is 
somewhat cluttered by the code to check for real/imaginary type mismatch 
errors).
* I've often read that the x87 floating-point stack is notoriously 
difficult for compilers to write code for, but it works quite well in 
this case.
* The major workarounds are:
- inability to use a tuple element directly from asm code (bug #1028);
- inability to define operators for built-in arrays (hence the use of 
'Vec' wrappers).
- inability to index through a tuple in a CTFE function (solved by 
converting types into a string).
* There have been mutterings about how unhygenic/dangerous the new 
mixins are. In this case, the mixin forms the _entire_ body of the 
function. This is an interesting situation which I think a language 
purist will find more palatable.

Enjoy.



More information about the Digitalmars-d mailing list