BLADE 0.2Alpha: Vector operations with mixins, expression templates, and asm

Davidl Davidl at 126.com
Tue Apr 3 18:10:51 PDT 2007


i think compilers have feeled a great challenge from compile-time back-end  
like power

> I have been trying to come up with a convincing use case for the new  
> mixins (and for metaprogramming in general). My best effort to date can  
> be found at:
> http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d
>
> It generates near-optimal x87 asm code for BLAS1-style basic vector  
> operations. 32, 64 and 80 bit vectors are all supported.
>
> Compile with -version=BladeDebug to see the asm code which is generated.
>
> Typical usage:
>
> void main()
> {
>      auto p = Vec([1.0L, 2, 18]);    // a vector of 80-bit reals.
>      auto q = Vec([3.5L, 1.1, 3.8]);  // ditto
>      auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
>      auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
>      real d = dot(r, p+r+r);
>      ireal e = dot(r, z);
>      q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
>      d = dot(r, p+r+r);
> }
>
> Notice that mixed-length operations (real[] + float[] - double[]) are  
> supported.
>
> Like the C++ Blitz++ library, expression templates are used to convert  
> vector expressions into efficient element-wise operations. Unlike that  
> library, however, there is no reliance on the compiler's optimiser.  
> Instead, the expression template is manipulated as text, converted into  
> postfix, and then passed to a simple CTFE compile-time assembler, which  
> creates highly efficient asm code which is used as a mixin.
> To understand the later parts of the code, you need some knowledge of  
> x87 assembler. In fact, you probably need to have read Agner Fog's  
> superb Pentium optimisation manual (www.agner.org).
>
> Some observations:
> * I was amazed at how simple the expression template code is (it is  
> somewhat cluttered by the code to check for real/imaginary type mismatch  
> errors).
> * I've often read that the x87 floating-point stack is notoriously  
> difficult for compilers to write code for, but it works quite well in  
> this case.
> * The major workarounds are:
> - inability to use a tuple element directly from asm code (bug #1028);
> - inability to define operators for built-in arrays (hence the use of  
> 'Vec' wrappers).
> - inability to index through a tuple in a CTFE function (solved by  
> converting types into a string).
> * There have been mutterings about how unhygenic/dangerous the new  
> mixins are. In this case, the mixin forms the _entire_ body of the  
> function. This is an interesting situation which I think a language  
> purist will find more palatable.
>
> Enjoy.




More information about the Digitalmars-d mailing list