BLADE 0.2Alpha: Vector operations with mixins, expression templates, and asm
Don Clugston
dac at nospam.com.au
Mon Apr 9 10:49:01 PDT 2007
Pragma wrote:
> Bruno Medeiros wrote:
>> Pragma wrote:
>>> Don Clugston wrote:
>>>> I have been trying to come up with a convincing use case for the new
>>>> mixins (and for metaprogramming in general). My best effort to date
>>>> can be found at:
>>>> http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d
>>>>
>>>>
>>>> It generates near-optimal x87 asm code for BLAS1-style basic vector
>>>> operations. 32, 64 and 80 bit vectors are all supported.
>>>>
>>>> Compile with -version=BladeDebug to see the asm code which is
>>>> generated.
>>>>
>>>> Typical usage:
>>>>
>>>> void main()
>>>> {
>>>> auto p = Vec([1.0L, 2, 18]); // a vector of 80-bit reals.
>>>> auto q = Vec([3.5L, 1.1, 3.8]); // ditto
>>>> auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
>>>> auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
>>>> real d = dot(r, p+r+r);
>>>> ireal e = dot(r, z);
>>>> q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
>>>> d = dot(r, p+r+r);
>>>> }
>>>>
>>>> Notice that mixed-length operations (real[] + float[] - double[])
>>>> are supported.
>>>>
>>>> Like the C++ Blitz++ library, expression templates are used to
>>>> convert vector expressions into efficient element-wise operations.
>>>> Unlike that library, however, there is no reliance on the compiler's
>>>> optimiser. Instead, the expression template is manipulated as text,
>>>> converted into postfix, and then passed to a simple CTFE
>>>> compile-time assembler, which creates highly efficient asm code
>>>> which is used as a mixin.
>>>> To understand the later parts of the code, you need some knowledge
>>>> of x87 assembler. In fact, you probably need to have read Agner
>>>> Fog's superb Pentium optimisation manual (www.agner.org).
>>>>
>>>> Some observations:
>>>> * I was amazed at how simple the expression template code is (it is
>>>> somewhat cluttered by the code to check for real/imaginary type
>>>> mismatch errors).
>>>> * I've often read that the x87 floating-point stack is notoriously
>>>> difficult for compilers to write code for, but it works quite well
>>>> in this case.
>>>> * The major workarounds are:
>>>> - inability to use a tuple element directly from asm code (bug #1028);
>>>> - inability to define operators for built-in arrays (hence the use
>>>> of 'Vec' wrappers).
>>>> - inability to index through a tuple in a CTFE function (solved by
>>>> converting types into a string).
>>>> * There have been mutterings about how unhygenic/dangerous the new
>>>> mixins are. In this case, the mixin forms the _entire_ body of the
>>>> function. This is an interesting situation which I think a language
>>>> purist will find more palatable.
>>>>
>>>> Enjoy.
>>>
>>> This is a work of art Don - it's practically a compiler extension.
>>> Nice job. :)
>>>
>>> For others that are interested in how this actually gets the job
>>> done, I found this in your documentation:
>>>
>>> * THEORY:
>>> * Expression templates are used to create an expression string of the
>>> form "(a+b*c)+d"
>>> * and a tuple, the entries of which correspond to a, b, c, d, ...
>>> * This string is converted to postfix. The postfix string is
>>> converted to
>>> * a string containing x87 asm, which is then mixed into a function
>>> which accepts the tuple.
>>>
>>
>> Hum, a minor question, is a string representation of the expressions
>> better (easier to use, manipulate, etc.) than a tree representation?
>>
>
> I know Don wrote this lib, but I think I can answer this.
>
> In short: no. But it's really an unintentionally loaded question: there
> are some rather severe limitations as to what kind of data structures
> you can create at compile time. You're basically limited to tuples and
> strings, each of which have drawbacks of their own.
>
> You can create a tree using tuples, but then you run into problems with
> over-running the limitations on identifier length built into the
> compiler.
Actually I don't know how to make a tree nicely with tuples.
Strings are no where near as flexible out of the box, but
> pose no storage limitations, which gives a slight edge to string-based
> representations of data. With some clever coding, strings can be made
> to store just about any structure imaginable (even if it does make for
> some ugly code).
There's a couple of minor things, though: (1) in the case of generating
D code to mixin, no tree is required (the string just gets mixed back in
again, so it might as well stay as a string).
(2) The "separate string and tuple" structure means the expression can
be optimised without needing to move any of the values around; this
creates less garbage for the compiler to optimise away.
(3) Some patterns are probably easier to find with a string, than a tree.
(4) It's more natural for D coders to think in terms of arrays, than
lists. (and there is much more language support).
(5) A string representation can easily express things like c=a+3*(a+b);
where 'a' appears twice. (At present, there's no way to generate it, but
theoretically the compiler could provide it many cases).
So -- if trees were available, I'm not sure if I'd use them, or not.
More information about the Digitalmars-d
mailing list