BLADE 0.2Alpha: Vector operations with mixins, expression templates, and asm

Thu Apr 5 02:54:25 PDT 2007

This is really neat.
I like that this basically is a compiler extension and it's still
a) well readable
b) produces simple error messages during compilation (if any)
c) introduces non-artificial syntax for the task

c) will actually depend more on the versatility of operator overloading
than the CTFE stuff.
a) and b) are so much better than what c++ template magic gave us (as
can be observed in boost).

Now i think there's only one requirement missing to make it a perfect
compiler extension and that is to produce debuggable code.
Actually in this case, as the extension is supposed to generate mostly
single line expression statements, it might not be that important. But
if we consider multi-line DSL statements, a way to mark generated code
as belonging to one of the DSL source lines would be great.

I think, a way for a CTF to recieve the file+line where it was called
from would do the trick. In BLADE that information could be added to the
expression string, passed through to the postfix notation and finally be
inserted into the ASM code with #line directives.

Don Clugston wrote:
> I have been trying to come up with a convincing use case for the new
> mixins (and for metaprogramming in general). My best effort to date can
> be found at:
> http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d
> 
> It generates near-optimal x87 asm code for BLAS1-style basic vector
> operations. 32, 64 and 80 bit vectors are all supported.
> 
> Compile with -version=BladeDebug to see the asm code which is generated.
> 
> Typical usage:
> 
> void main()
> {
>     auto p = Vec([1.0L, 2, 18]);    // a vector of 80-bit reals.
>     auto q = Vec([3.5L, 1.1, 3.8]);  // ditto
>     auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
>     auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
>     real d = dot(r, p+r+r);
>     ireal e = dot(r, z);
>     q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
>     d = dot(r, p+r+r);
> }
> 
> Notice that mixed-length operations (real[] + float[] - double[]) are
> supported.
> 
> Like the C++ Blitz++ library, expression templates are used to convert
> vector expressions into efficient element-wise operations. Unlike that
> library, however, there is no reliance on the compiler's optimiser.
> Instead, the expression template is manipulated as text, converted into
> postfix, and then passed to a simple CTFE compile-time assembler, which
> creates highly efficient asm code which is used as a mixin.
> To understand the later parts of the code, you need some knowledge of
> x87 assembler. In fact, you probably need to have read Agner Fog's
> superb Pentium optimisation manual (www.agner.org).
> 
> Some observations:
> * I was amazed at how simple the expression template code is (it is
> somewhat cluttered by the code to check for real/imaginary type mismatch
> errors).
> * I've often read that the x87 floating-point stack is notoriously
> difficult for compilers to write code for, but it works quite well in
> this case.
> * The major workarounds are:
> - inability to use a tuple element directly from asm code (bug #1028);
> - inability to define operators for built-in arrays (hence the use of
> 'Vec' wrappers).
> - inability to index through a tuple in a CTFE function (solved by
> converting types into a string).
> * There have been mutterings about how unhygenic/dangerous the new
> mixins are. In this case, the mixin forms the _entire_ body of the
> function. This is an interesting situation which I think a language
> purist will find more palatable.
> 
> Enjoy.