BLADE 0.2Alpha: Vector operations with mixins, expression templates, and asm

Bruno Medeiros brunodomedeiros+spam at com.gmail
Sat Apr 7 06:36:05 PDT 2007


Pragma wrote:
> Don Clugston wrote:
>> I have been trying to come up with a convincing use case for the new 
>> mixins (and for metaprogramming in general). My best effort to date 
>> can be found at:
>> http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d
>>
>> It generates near-optimal x87 asm code for BLAS1-style basic vector 
>> operations. 32, 64 and 80 bit vectors are all supported.
>>
>> Compile with -version=BladeDebug to see the asm code which is generated.
>>
>> Typical usage:
>>
>> void main()
>> {
>>     auto p = Vec([1.0L, 2, 18]);    // a vector of 80-bit reals.
>>     auto q = Vec([3.5L, 1.1, 3.8]);  // ditto
>>     auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
>>     auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
>>     real d = dot(r, p+r+r);
>>     ireal e = dot(r, z);
>>     q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
>>     d = dot(r, p+r+r);
>> }
>>
>> Notice that mixed-length operations (real[] + float[] - double[]) are 
>> supported.
>>
>> Like the C++ Blitz++ library, expression templates are used to convert 
>> vector expressions into efficient element-wise operations. Unlike that 
>> library, however, there is no reliance on the compiler's optimiser. 
>> Instead, the expression template is manipulated as text, converted 
>> into postfix, and then passed to a simple CTFE compile-time assembler, 
>> which creates highly efficient asm code which is used as a mixin.
>> To understand the later parts of the code, you need some knowledge of 
>> x87 assembler. In fact, you probably need to have read Agner Fog's 
>> superb Pentium optimisation manual (www.agner.org).
>>
>> Some observations:
>> * I was amazed at how simple the expression template code is (it is 
>> somewhat cluttered by the code to check for real/imaginary type 
>> mismatch errors).
>> * I've often read that the x87 floating-point stack is notoriously 
>> difficult for compilers to write code for, but it works quite well in 
>> this case.
>> * The major workarounds are:
>> - inability to use a tuple element directly from asm code (bug #1028);
>> - inability to define operators for built-in arrays (hence the use of 
>> 'Vec' wrappers).
>> - inability to index through a tuple in a CTFE function (solved by 
>> converting types into a string).
>> * There have been mutterings about how unhygenic/dangerous the new 
>> mixins are. In this case, the mixin forms the _entire_ body of the 
>> function. This is an interesting situation which I think a language 
>> purist will find more palatable.
>>
>> Enjoy.
> 
> This is a work of art Don - it's practically a compiler extension.  Nice 
> job. :)
> 
> For others that are interested in how this actually gets the job done, I 
> found this in your documentation:
> 
> * THEORY:
> * Expression templates are used to create an expression string of the 
> form "(a+b*c)+d"
> * and a tuple, the entries of which correspond to a, b, c, d, ...
> * This string is converted to postfix. The postfix string is converted to
> * a string containing x87 asm, which is then mixed into a function which 
> accepts the tuple.
> 

Hum, a minor question, is a string representation of the expressions 
better (easier to use, manipulate, etc.) than a tree representation?

-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D



More information about the Digitalmars-d mailing list