BLADE 0.2Alpha: Vector operations with mixins, expression templates, and asm

Tue Apr 10 04:05:38 PDT 2007

Pragma schrieb:
> Don Clugston wrote:
>> Pragma wrote:
>>> Bruno Medeiros wrote:
>>>> Pragma wrote:
>>>>> Don Clugston wrote:
>>>>>> I have been trying to come up with a convincing use case for the 
>>>>>> new mixins (and for metaprogramming in general). My best effort to 
>>>>>> date can be found at:
>>>>>> http://www.dsource.org/projects/mathextra/browser/trunk/mathextra/Blade.d 
>>>>>>
>>>>>>
>>>>>> It generates near-optimal x87 asm code for BLAS1-style basic 
>>>>>> vector operations. 32, 64 and 80 bit vectors are all supported.
>>>>>>
>>>>>> Compile with -version=BladeDebug to see the asm code which is 
>>>>>> generated.
>>>>>>
>>>>>> Typical usage:
>>>>>>
>>>>>> void main()
>>>>>> {
>>>>>>     auto p = Vec([1.0L, 2, 18]);    // a vector of 80-bit reals.
>>>>>>     auto q = Vec([3.5L, 1.1, 3.8]);  // ditto
>>>>>>     auto r = Vec([17.0f, 28.25, 1]); // a vector of 32-bit floats
>>>>>>     auto z = Vec([17.0i, 28.1i, 1i]); // a vector of 64-bit idoubles
>>>>>>     real d = dot(r, p+r+r);
>>>>>>     ireal e = dot(r, z);
>>>>>>     q -= ((r+p)*18.0L*314.1L - (p-r))* 35;
>>>>>>     d = dot(r, p+r+r);
>>>>>> }
>>>>>>
>>>>>> Notice that mixed-length operations (real[] + float[] - double[]) 
>>>>>> are supported.
>>>>>>
>>>>>> Like the C++ Blitz++ library, expression templates are used to 
>>>>>> convert vector expressions into efficient element-wise operations. 
>>>>>> Unlike that library, however, there is no reliance on the 
>>>>>> compiler's optimiser. Instead, the expression template is 
>>>>>> manipulated as text, converted into postfix, and then passed to a 
>>>>>> simple CTFE compile-time assembler, which creates highly efficient 
>>>>>> asm code which is used as a mixin.
>>>>>> To understand the later parts of the code, you need some knowledge 
>>>>>> of x87 assembler. In fact, you probably need to have read Agner 
>>>>>> Fog's superb Pentium optimisation manual (www.agner.org).
>>>>>>
>>>>>> Some observations:
>>>>>> * I was amazed at how simple the expression template code is (it 
>>>>>> is somewhat cluttered by the code to check for real/imaginary type 
>>>>>> mismatch errors).
>>>>>> * I've often read that the x87 floating-point stack is notoriously 
>>>>>> difficult for compilers to write code for, but it works quite well 
>>>>>> in this case.
>>>>>> * The major workarounds are:
>>>>>> - inability to use a tuple element directly from asm code (bug 
>>>>>> #1028);
>>>>>> - inability to define operators for built-in arrays (hence the use 
>>>>>> of 'Vec' wrappers).
>>>>>> - inability to index through a tuple in a CTFE function (solved by 
>>>>>> converting types into a string).
>>>>>> * There have been mutterings about how unhygenic/dangerous the new 
>>>>>> mixins are. In this case, the mixin forms the _entire_ body of the 
>>>>>> function. This is an interesting situation which I think a 
>>>>>> language purist will find more palatable.
>>>>>>
>>>>>> Enjoy.
>>>>>
>>>>> This is a work of art Don - it's practically a compiler extension.  
>>>>> Nice job. :)
>>>>>
>>>>> For others that are interested in how this actually gets the job 
>>>>> done, I found this in your documentation:
>>>>>
>>>>> * THEORY:
>>>>> * Expression templates are used to create an expression string of 
>>>>> the form "(a+b*c)+d"
>>>>> * and a tuple, the entries of which correspond to a, b, c, d, ...
>>>>> * This string is converted to postfix. The postfix string is 
>>>>> converted to
>>>>> * a string containing x87 asm, which is then mixed into a function 
>>>>> which accepts the tuple.
>>>>>
>>>>
>>>> Hum, a minor question, is a string representation of the expressions 
>>>> better (easier to use, manipulate, etc.) than a tree representation?
>>>>
>>>
>>> I know Don wrote this lib, but I think I can answer this.
>>>
>>> In short: no.  But it's really an unintentionally loaded question: 
>>> there are some rather severe limitations as to what kind of data 
>>> structures you can create at compile time.  You're basically limited 
>>> to tuples and strings, each of which have drawbacks of their own.
>>>
>>> You can create a tree using tuples, but then you run into problems 
>>> with over-running the limitations on identifier length built into the 
>>> compiler. 
>> Actually I don't know how to make a tree nicely with tuples.
> 
> Here's an example (for grins):
> 
> import std.stdio;
> 
> template Node(char[] Data,Nodes...){
>     alias Data data;
>     alias Nodes nodes;
> }
> 
> template PrintData(char[] parent){
>     const char[] PrintData = "";
> }
> 
> template PrintData(char[] parent,alias Node){
>     static if(parent == ""){
>         const char[] PrintData = Node.data ~ "\n" ~
>             PrintData!(Node.data,Node.nodes);
>     }
>     else{
>         const char[] PrintData = parent ~ "." ~ Node.data ~ "\n" ~
>             PrintData!(parent ~ "." ~ Node.data,Node.nodes);
>     }
> }
> 
> template PrintData(char[] parent,alias Node,V...){
>     const char[] PrintData = PrintData!(parent,Node) ~ 
> PrintData!(parent,V);
> }
> 
> // example "tree" structure
> alias Node!("1",
>     Node!("1"),
>     Node!("2",
>         Node!("1"),
>         Node!("2")
>     )
> ) dataset;
> 
> void main(){
>     writefln("%s",PrintData!("",dataset));
> }
> 
>>
>>  Strings are no where near as flexible out of the box, but
>>> pose no storage limitations, which gives a slight edge to 
>>> string-based representations of data.  With some clever coding, 
>>> strings can be made to store just about any structure imaginable 
>>> (even if it does make for some ugly code).
>>
>> There's a couple of minor things, though: (1) in the case of 
>> generating D code to mixin, no tree is required (the string just gets 
>> mixed back in again, so it might as well stay as a string).
>> (2) The "separate string and tuple" structure means the expression can 
>> be optimised without needing to move any of the values around; this 
>> creates less garbage for the compiler to optimise away.
>> (3) Some patterns are probably easier to find with a string, than a tree.
>> (4) It's more natural for D coders to think in terms of arrays, than 
>> lists. (and there is much more language support).
>>
>> (5) A string representation can easily express things like c=a+3*(a+b);
>> where 'a' appears twice. (At present, there's no way to generate it, 
>> but theoretically the compiler could provide it many cases).
>>
>> So -- if trees were available, I'm not sure if I'd use them, or not.
> 
> They're really handy.  I've found some uses for them already:
> 
> /*
> ZeroOrMoreExpr
>     = generate.ZeroOrMoreExpr()
>     ::= "["  "{"  Expression:expr  "}"  "]"  [RangeSetExpr:ranges ] 
> [Binding:binding ]
>         ["*" SubExpressionTerm:delim ] [SubExpressionTerm:term];
> */
> bool ZeroOrMoreExpr(inout char[] value,inout char[] bindings,inout 
> char[] tokens,inout char[] err){
>     return Rule!(
>         generate.ZeroOrMoreExpr,
>         And!(
>             Literal!("["),
>             Literal!("{"),
>             Bind!(Expression,"expr"),
>             Literal!("}"),
>             Literal!("]"),
>             Optional!(
>                 Bind!(RangeSetExpr,"ranges")
>             ),           
>             Optional!(
>                 Bind!(Binding,"binding")
>             ),
>             Optional!(
>                 And!(
>                     Literal!("*"),
>                     Bind!(SubExpressionTerm,"delim")
>                 )
>             ),
>             Bind!(SubExpressionTerm,"term")
>         )
>     )(value,bindings,tokens,err);       
> }
> 
> There's a *lot* going on in this sample.  Basically what you see here is 
> a chunk of the as-of-yet-experimental compile-time Enki parser.  This 
> piece parses the Zero-or-more expression part of the EBNF variant that 
> Enki supports.
> 
> The templates evaluate to CTFE's that in turn make up the parser when 
> it's executed.  So there's several layers of compile-time evaluation 
> going on here.  Also, what's not obvious is that "char[] tokens" is 
> actually an *array of strings* that is stored in a single char[] array; 
> each string's length data is encoded as a size_t mapped onto the 
> appropriate number of chars.  The Bind!() expressions also map parsed 
> out data to a key/value set which is stored in a similar fashion (char[] 
> bindings).
> 
> The goals here were to side-step the limitations of D's identifier 
> name-length while providing a user-readable toolkit for parser 
> composition; the only limit now is placed on the length of any given 
> rule.  I haven't found a way to unify the compile-time and run-time 
> parser generation yet (one backend with two targets), but I anticipate 
> using something very similar to the above by providing a CT and RT 
> version of the template library.  At a minimum I plan on having this 
> consume a .bnf file at compile-time, in order to create a run-time 
> parser via mixin.
> 

Hey pragma,

really cool, I've come up with a similar structure while experimenting
with a PEG parser in D (see attachment)
after I've read this article series on
Codeproject:

http://www.codeproject.com/cpp/crafting_interpreter_p1.asp
http://www.codeproject.com/cpp/crafting_interpreter_p2.asp
http://www.codeproject.com/cpp/crafting_interpreter_p3.asp

The template system of D does an awesome job in keeping
templated PEG grammars readable.

BTW: If you turn Enki into a PEG style parser I definitely throw
my attempts into the dustbin :-)
Greets

KlausO

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: dpegtest.d
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20070410/8ccb48ca/attachment.ksh>