DeRailed DSL (was Re: compile-time regex redux)

Fri Feb 9 20:41:18 PST 2007

Bill Baxter wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
>> This is a misunderstanding. The syntax is not to be extended. It stays 
>> fixed, and that is arguably a good thing. The semantics become more 
>> flexible. For example, they will make it easy to write a matrix 
>> operation:
>>
>> A = (I - B) * C + D
>>
>> and generate highly performant code from it. (There are many reasons 
>> for which that's way harder than it looks.)
> 
> This is one thing I haven't really understood in the discussion.  How do 
> the current proposals help that case?  From what I'm getting you're 
> going to have to write every statement like the above as something like:
> 
> mixin {
>    ProcessMatrixExpr!( "A = (I - B) * C + D;" );
> }
> 
> How do you get from this mixin/string processing stuff to an API that I 
> might actually be willing to use for common operations?

That's a great starting point for a good conversation. And no, you 
wouldn't do it with strings, you'd do it exactly as I wrote, with 
regular code. You could use strings if you want to use more math-looking 
operators and Greek symbols etc.

>> I think there is a lot of apprehension and misunderstanding 
>> surrounding what metacode is able and supposed to do or simplify. 
>> Please, let's focus on understanding _before_ forming an opinion.
> 
> I think that's exactly what Tom's getting at.  He's asking for examples 
> of how this would make like better for him and others.  I think given 
> your background you take it for granted that metaprogramming is the 
> future.  But D is attracts folks from all kinds of walks of life, 
> because it promises to be a kinder, gentler C++.  So some people here 
> aren't even sure why D needs templates at all.  Fortran doesn't have 'em 
> after all.  And Java just barely does.

Then such people would be hard-pressed to figure why tuples (which are 
essentially typelists, the single most used component in Loki) have 
engendered, according to Walter himself, a surge in language popularity.

I don't even think metaprogramming is the future. I think that it's one 
of many helpful tools for writing good libraries, as are GC, modules, or 
interfaces. *Everybody* appreciates good libraries.

> Anyway, I think it would help get everyone on board if some specific and 
> useful examples were given of how this solves real problems (and no I 
> don't really consider black and white holes as solving real problems.  I 
> couldn't even find any non-astronomical uses of the terms in a google 
> search.)

The terms are recent and introduced by the Perl community. I happened to 
like the metaphor. The consecrated name is "null object pattern".

> For instance, it would be nice to see some more concrete discussion about
> * the "rails" case.
> * the X = A*B + C matrix/vector expressions case.
> * the case of generating bindings to scripting langauges / ORB stubs
> * the Spirit/parser generator case
> 
> So here's my take on vector expressions since that's the only one I know 
> anything about.
> 
> *Problem statement*:
> Make the expression A=(I-B)*C+D efficient, where the variables are large 
> vectors (I'll leave out matrices for now).
> 
> *Why it's hard*:
> The difficulty is that (ignoring SSE instruction etc) the most efficient 
> way to compute that is do all operations component-wise.  So instead of 
> computing I-B then multiplying by C, you compute
>     A[i] = (I[i]-B[i])*C[i]+D[i];
> for each i.  This eliminates the need to allocate large intermediate 
> vectors.

There's much more than that. (First off, things are more complicated in 
the case of matrices.) You want to be gentle on cache, so when you have 
any column-wise iteration you want to do matrix blocking. Depending on 
the expression, you want to select different block sizes.

Then you also want to do optionally partial unrolling *on top of 
blocking*. Again, the amount of unrolling might depend on the 
expression. (You don't want to unroll large expressions.) At this point 
things are utterly messy.

Iterating may also be better row-wise or column-wise, depending on the 
expression. Some subexpressions have a "preferential" row-wise scanning 
order and a "possible" column-wise order. Others must do things one way 
or the other. For all this the appropriate code must be generated.

> *Existing solutions*:
>   Expression templates in C++, e.g. The Blitz++ library.  Instead of 
> making opSub in I-B return a new Vector object, you make opSub return an 
> ExpressionTemplate object.  This is a little template struct that 
> contains a reference to I and to B, and knows how to subtract the two in 
> a component-wise manner.  The types of I and B are template parameters, 
> LeftT and RightT. Its interface also allows it to be treated just like a 
> Vector.  You can add a vector to it, subtract a Vector from it etc.
> 
> Now we go and try to multiply that result times C.  The result of that 
> is a new MultExpressionTemplate with two paramters, the LeftT being our 
> previous SubExpressionTemplate!(Vector,Vector) and the RightT being Vector.
> 
> Proceeding on in this way eventually the result of the math is of type:
> 
> AddExpressionTemplate!(
>    MultExpressionTemplate!(
>       SubExpressionTemplate!(Vector,Vector),
>       Vector),
>    Vector)
> 
> And you can see that we basically have a parse tree expressed as nested 
> templates.  The final trick is that a Vector.opAssign that takes an 
> ExpressionTemplate is provided and that method calls a method of the 
> expression template to finally trigger the calculation, like 
> expr.eval(this).  eval() has the top-level loop over the components of 
> the answer.
> 
> *Why Existing solutions are insufficient*
> For that all that effort to actually be useful the compiler has to be 
> pretty agressive about inlining everything so that in the end all the 
> temporary template structs and function calls go away and you're just 
> left with one eval call.  It can be tricky to get the code into just the 
> right configuration so that the compiler will do the inlining.  And even 
> then results will depend on the quality of the compiler.  My attempts 
> using MSVC++ several years ago always came out to be slower than the 
> naive version, but with about 10x the amount of code.  The code is also 
> pretty difficult to follow because of all those different types of 
> expression templates that have to be created.
> 
> If you include matrices things are even trickier because there are 
> special cases like "A*B+a*C" that can be computed efficiently by a 
> single optimized routine.  You'd like to recognize such cases and turn 
> them into single calls to that fast routine.  You might also want to 
> recognize "A*B+C" as a special case of that with a==1.
> 
> *How it could be improved*:
> ??? this is what I'd like to see explained better.

Great. One problem is that building ET libraries is exceedingly hard 
because of the multiple issues that must be solved simultaneously. To 
get a glimpse into the kind of problems that we are talking about, see 
e.g. http://www.adtmag.com/joop/carticle.aspx?ID=627. Libraries cannot 
afford to implement an optimal solution, partly because they have so 
much language-y mud to go through, and partly because the C++ language 
simply does not offer the code generation abilities that are needed.

I haven't sat down to design a linear algebra using D's new abilities, 
but they are definitely helpful in that specifying an elementary matrix 
operation (the core of a loop) only needs one template. For example, if 
you want to specify the kernel of a matrix addition, you can say it as a 
type:

MatrixOp!("lhs[i, j] + rhs[i, j]", rowwise, columnwise)

The type specifies the kernel of the operation, conventionally expressed 
in terms of lhs, rhs, i, and j, then the preferred iteration order, and 
finally the alternate order. The MatrixOp template defines a member 
function get(uint i, uint j) that expands the string given in the

The kernel of multiplication is:

MatrixOp!("sum!(k, 0, n, lhs[i, k], rhs[k, j])", rowwise, columnwise, 
blocking!(8), unrolling!(8))

(sum is not doable in today's D because D lacks the ability to inject a 
new alias name into a template upon invocation.)

This type specifies how an element of the multiplication is obtained, 
again what preferred and alternate method of iteration is possible, and 
also specifies blocking and unrolling suggestions.

In a complex expression, lhs and rhs will bind to subexpressions; it's 
easy to then generate the appropriate code upon assignment to the 
target, with heuristically chosen blocking and unrolling parameters 
based on the suggestions made by the subexpressions. All this is doable 
without difficulty through manipulating and generating code during 
compilation.

Andrei