DeRailed DSL (was Re: compile-time regex redux)

Fri Feb 9 19:01:11 PST 2007

Bill Baxter wrote:
> Bill Baxter wrote:
>> Andrei Alexandrescu (See Website For Email) wrote:
>>> Tom S wrote:
>>
>>>> Therefore, I'd like to see a case when a compile-time DSL processing 
>>>> is going to be really useful in the real world, as to provoke 
>>>> further complication of the compiler and its usage patterns.
>>>
>>> I can't parse this sentence. Did you mean "as opposed to provoking" 
>>> instead of "as to provoke"?
>>
>> I think he just meant he wants to see some real world examples that 
>> justify the additional complexity that will be added to the compiler 
>> and language.
>>
>>>> The other aspect I observed in the discussions following the new dmd 
>>>> release, is that folks are suggesting writing full language parsers, 
>>>> D preprocessors and various sort of operations on complex languages, 
>>>> notably extended-D parsing... This may sound weird in my lips, but 
>>>> that's clearly abuse. What these people really want is a way to 
>>>> extend the language's syntax, pretty much as Nemerle does it. And 
>>>> frankly, if D is going to be a more powerful language, built-in text 
>>>> preprocessing won't cut it. Full fledged macro support and syntax 
>>>> extension mechanics are something that we should look at.
>>>
>>> This is a misunderstanding. The syntax is not to be extended. It 
>>> stays fixed, and that is arguably a good thing. The semantics become 
>>> more flexible. For example, they will make it easy to write a matrix 
>>> operation:
>>>
>>> A = (I - B) * C + D
>>>
>>> and generate highly performant code from it. (There are many reasons 
>>> for which that's way harder than it looks.)
>>
>> This is one thing I haven't really understood in the discussion.  How 
>> do the current proposals help that case?  From what I'm getting you're 
>> going to have to write every statement like the above as something like:
>>
>> mixin {
>>    ProcessMatrixExpr!( "A = (I - B) * C + D;" );
>> }
>>
>> How do you get from this mixin/string processing stuff to an API that 
>> I might actually be willing to use for common operations?
>>
>>
>>> I think there is a lot of apprehension and misunderstanding 
>>> surrounding what metacode is able and supposed to do or simplify. 
>>> Please, let's focus on understanding _before_ forming an opinion.
>>
>> I think that's exactly what Tom's getting at.  He's asking for 
>> examples of how this would make like better for him and others.  I 
>> think given your background you take it for granted that 
>> metaprogramming is the future.  But D is attracts folks from all kinds 
>> of walks of life, because it promises to be a kinder, gentler C++.  So 
>> some people here aren't even sure why D needs templates at all.  
>> Fortran doesn't have 'em after all.  And Java just barely does.
>>
>> Anyway, I think it would help get everyone on board if some specific 
>> and useful examples were given of how this solves real problems (and 
>> no I don't really consider black and white holes as solving real 
>> problems.  I couldn't even find any non-astronomical uses of the terms 
>> in a google search.)
>>
>> For instance, it would be nice to see some more concrete discussion about
>> * the "rails" case.
>> * the X = A*B + C matrix/vector expressions case.
>> * the case of generating bindings to scripting langauges / ORB stubs
>> * the Spirit/parser generator case
>>
>> So here's my take on vector expressions since that's the only one I 
>> know anything about.
>>
>> *Problem statement*:
>> Make the expression A=(I-B)*C+D efficient, where the variables are 
>> large vectors (I'll leave out matrices for now).
>>
>> *Why it's hard*:
>> The difficulty is that (ignoring SSE instruction etc) the most 
>> efficient way to compute that is do all operations component-wise.  So 
>> instead of computing I-B then multiplying by C, you compute
>>     A[i] = (I[i]-B[i])*C[i]+D[i];
>> for each i.  This eliminates the need to allocate large intermediate 
>> vectors.
>>
>> *Existing solutions*:
>>   Expression templates in C++, e.g. The Blitz++ library.  Instead of 
>> making opSub in I-B return a new Vector object, you make opSub return 
>> an ExpressionTemplate object.  This is a little template struct that 
>> contains a reference to I and to B, and knows how to subtract the two 
>> in a component-wise manner.  The types of I and B are template 
>> parameters, LeftT and RightT. Its interface also allows it to be 
>> treated just like a Vector.  You can add a vector to it, subtract a 
>> Vector from it etc.
>>
>> Now we go and try to multiply that result times C.  The result of that 
>> is a new MultExpressionTemplate with two paramters, the LeftT being 
>> our previous SubExpressionTemplate!(Vector,Vector) and the RightT 
>> being Vector.
>>
>> Proceeding on in this way eventually the result of the math is of type:
>>
>> AddExpressionTemplate!(
>>    MultExpressionTemplate!(
>>       SubExpressionTemplate!(Vector,Vector),
>>       Vector),
>>    Vector)
>>
>> And you can see that we basically have a parse tree expressed as 
>> nested templates.  The final trick is that a Vector.opAssign that 
>> takes an ExpressionTemplate is provided and that method calls a method 
>> of the expression template to finally trigger the calculation, like 
>> expr.eval(this).  eval() has the top-level loop over the components of 
>> the answer.
>>
>> *Why Existing solutions are insufficient*
>> For that all that effort to actually be useful the compiler has to be 
>> pretty agressive about inlining everything so that in the end all the 
>> temporary template structs and function calls go away and you're just 
>> left with one eval call.  It can be tricky to get the code into just 
>> the right configuration so that the compiler will do the inlining.  
>> And even then results will depend on the quality of the compiler.  My 
>> attempts using MSVC++ several years ago always came out to be slower 
>> than the naive version, but with about 10x the amount of code.  The 
>> code is also pretty difficult to follow because of all those different 
>> types of expression templates that have to be created.
>>
>> If you include matrices things are even trickier because there are 
>> special cases like "A*B+a*C" that can be computed efficiently by a 
>> single optimized routine.  You'd like to recognize such cases and turn 
>> them into single calls to that fast routine.  You might also want to 
>> recognize "A*B+C" as a special case of that with a==1.
>>
>> *How it could be improved*:
>> ??? this is what I'd like to see explained better.
> 
> Ok, here's my high-level stab at how the new stuff could help.
> Instead of returning "expression templates" where the the parse tree is 
> represented as a hierarchical type, just make opSub (I-B) return an 
> ExpressionWrapper which is just a light wrapper around the string "I-B". 
>  Next the compiler gets to "ExpressionWrapper * C", have that just 
> return another ExpressionWrapper that modifies the original string to be
>    "(" ~ origstring ~ "*C"
> In the end you still have an opAssign overload that calls eval() on the 
> expression, but now the expression has a (fully parenthesized) string 
> representation of the expression:  "((I-B)*C)+D".  And eval just has to 
> use the new compile-time string parsing tricks to decide how best to 
> evaluate that expression.  It's a pattern matching task, and thankfully 
> now you have the whole pattern in one place, as opposed to the 
> ExpressionTemplates, which have a hard time really getting a look at the 
> whole picture.

You're on the right track!!! I was writing an answer, but I need to 
leave so I have to interrupt it.

Thanks for lending a hand :o).

Andrei