BLADE 0.2Alpha: Vector operations with mixins, expression templates, and asm
Don Clugston
dac at nospam.com.au
Tue Apr 10 00:01:53 PDT 2007
Pragma wrote:
> Don Clugston wrote:
>> Pragma wrote:
>>> Bruno Medeiros wrote:
>>>> Pragma wrote:
>>>>> Don Clugston wrote:
>>>>>> I have been trying to come up with a convincing use case for the
>>>>>> new mixins (and for metaprogramming in general). My best effort to
>>>>>> date can be found at:
>>>>>> http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d
>>>>>>
>>>>>>
>>>>>> It generates near-optimal x87 asm code for BLAS1-style basic
>>>>>> vector operations. 32, 64 and 80 bit vectors are all supported.
>>>>>>
>>>>>> Compile with -version=BladeDebug to see the asm code which is
>>>>>> generated.
>>>>>>
>>>>>> Typical usage:
>>>>>>
>>>>>> void main()
>>>>>> {
>>>>>> auto p = Vec([1.0L, 2, 18]); // a vector of 80-bit reals.
>>>>>> auto q = Vec([3.5L, 1.1, 3.8]); // ditto
>>>>>> auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
>>>>>> auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
>>>>>> real d = dot(r, p+r+r);
>>>>>> ireal e = dot(r, z);
>>>>>> q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
>>>>>> d = dot(r, p+r+r);
>>>>>> }
>>>>>>
>>>>>> Notice that mixed-length operations (real[] + float[] - double[])
>>>>>> are supported.
>>>>>>
>>>>>> Like the C++ Blitz++ library, expression templates are used to
>>>>>> convert vector expressions into efficient element-wise operations.
>>>>>> Unlike that library, however, there is no reliance on the
>>>>>> compiler's optimiser. Instead, the expression template is
>>>>>> manipulated as text, converted into postfix, and then passed to a
>>>>>> simple CTFE compile-time assembler, which creates highly efficient
>>>>>> asm code which is used as a mixin.
>>>>>> To understand the later parts of the code, you need some knowledge
>>>>>> of x87 assembler. In fact, you probably need to have read Agner
>>>>>> Fog's superb Pentium optimisation manual (www.agner.org).
>>>>>>
>>>>>> Some observations:
>>>>>> * I was amazed at how simple the expression template code is (it
>>>>>> is somewhat cluttered by the code to check for real/imaginary type
>>>>>> mismatch errors).
>>>>>> * I've often read that the x87 floating-point stack is notoriously
>>>>>> difficult for compilers to write code for, but it works quite well
>>>>>> in this case.
>>>>>> * The major workarounds are:
>>>>>> - inability to use a tuple element directly from asm code (bug
>>>>>> #1028);
>>>>>> - inability to define operators for built-in arrays (hence the use
>>>>>> of 'Vec' wrappers).
>>>>>> - inability to index through a tuple in a CTFE function (solved by
>>>>>> converting types into a string).
>>>>>> * There have been mutterings about how unhygenic/dangerous the new
>>>>>> mixins are. In this case, the mixin forms the _entire_ body of the
>>>>>> function. This is an interesting situation which I think a
>>>>>> language purist will find more palatable.
>>>>>>
>>>>>> Enjoy.
>>>>>
>>>>> This is a work of art Don - it's practically a compiler extension.
>>>>> Nice job. :)
>>>>>
>>>>> For others that are interested in how this actually gets the job
>>>>> done, I found this in your documentation:
>>>>>
>>>>> * THEORY:
>>>>> * Expression templates are used to create an expression string of
>>>>> the form "(a+b*c)+d"
>>>>> * and a tuple, the entries of which correspond to a, b, c, d, ...
>>>>> * This string is converted to postfix. The postfix string is
>>>>> converted to
>>>>> * a string containing x87 asm, which is then mixed into a function
>>>>> which accepts the tuple.
>>>>>
>>>>
>>>> Hum, a minor question, is a string representation of the expressions
>>>> better (easier to use, manipulate, etc.) than a tree representation?
>>>>
>>>
>>> I know Don wrote this lib, but I think I can answer this.
>>>
>>> In short: no. But it's really an unintentionally loaded question:
>>> there are some rather severe limitations as to what kind of data
>>> structures you can create at compile time. You're basically limited
>>> to tuples and strings, each of which have drawbacks of their own.
>>>
>>> You can create a tree using tuples, but then you run into problems
>>> with over-running the limitations on identifier length built into the
>>> compiler.
>> Actually I don't know how to make a tree nicely with tuples.
>
> Here's an example (for grins):
>
> import std.stdio;
>
> template Node(char[] Data,Nodes...){
> alias Data data;
> alias Nodes nodes;
> }
>
> // example "tree" structure
> alias Node!("1",
> Node!("1"),
> Node!("2",
> Node!("1"),
> Node!("2")
> )
> ) dataset;
The problem I was referring to, is: how to store both values, and
functions/operators, inside the tree. It seems to get messy very quickly.
>> So -- if trees were available, I'm not sure if I'd use them, or not.
>
> They're really handy. I've found some uses for them already:
I meant for this application. There's no doubt they're indispensable in
other contexts.
>
> /*
> ZeroOrMoreExpr
> = generate.ZeroOrMoreExpr()
> ::= "[" "{" Expression:expr "}" "]" [RangeSetExpr:ranges ]
> [Binding:binding ]
> ["*" SubExpressionTerm:delim ] [SubExpressionTerm:term];
> */
> bool ZeroOrMoreExpr(inout char[] value,inout char[] bindings,inout
> char[] tokens,inout char[] err){
> return Rule!(
> generate.ZeroOrMoreExpr,
> And!(
> Literal!("["),
> Literal!("{"),
> Bind!(Expression,"expr"),
> Literal!("}"),
> Literal!("]"),
> Optional!(
> Bind!(RangeSetExpr,"ranges")
> ),
> Optional!(
> Bind!(Binding,"binding")
> ),
> Optional!(
> And!(
> Literal!("*"),
> Bind!(SubExpressionTerm,"delim")
> )
> ),
> Bind!(SubExpressionTerm,"term")
> )
> )(value,bindings,tokens,err);
> }
>
> There's a *lot* going on in this sample. Basically what you see here is
> a chunk of the as-of-yet-experimental compile-time Enki parser. This
> piece parses the Zero-or-more expression part of the EBNF variant that
> Enki supports.
>
> The templates evaluate to CTFE's that in turn make up the parser when
> it's executed. So there's several layers of compile-time evaluation
> going on here. Also, what's not obvious is that "char[] tokens" is
> actually an *array of strings* that is stored in a single char[] array;
> each string's length data is encoded as a size_t mapped onto the
> appropriate number of chars. The Bind!() expressions also map parsed
> out data to a key/value set which is stored in a similar fashion (char[]
> bindings).
Seriously cool! Seems like you're generating a tree of nested mixins?
Anyway, I suspect this will really benefit from any compiler
improvements, eg CTFE support for AAs or nested functions. And obviously
AST macros.
> The goals here were to side-step the limitations of D's identifier
> name-length while providing a user-readable toolkit for parser
> composition; the only limit now is placed on the length of any given
> rule. I haven't found a way to unify the compile-time and run-time
> parser generation yet (one backend with two targets), but I anticipate
> using something very similar to the above by providing a CT and RT
> version of the template library. At a minimum I plan on having this
> consume a .bnf file at compile-time, in order to create a run-time
> parser via mixin.
I really think the new mixins + CTFE was a breakthrough. Publish some of
this stuff, and I think we'll see an exodus of many Boost developers into D.
More information about the Digitalmars-d
mailing list