DeRailed DSL (was Re: compile-time regex redux)
Andrei Alexandrescu (See Website For Email)
SeeWebsiteForEmail at erdani.org
Sun Feb 11 08:58:16 PST 2007
Don Clugston wrote:
> Andrei Alexandrescu (See Website for Email) wrote:
>> Bill Baxter wrote:
>>> Andrei Alexandrescu (See Website For Email) wrote:
>>>> This is a misunderstanding. The syntax is not to be extended. It
>>>> stays fixed, and that is arguably a good thing. The semantics become
>>>> more flexible. For example, they will make it easy to write a matrix
>>>> operation:
>>>>
>>>> A = (I - B) * C + D
>>>>
>>>> and generate highly performant code from it. (There are many reasons
>>>> for which that's way harder than it looks.)
>>>
>>> This is one thing I haven't really understood in the discussion. How
>>> do the current proposals help that case? From what I'm getting
>>> you're going to have to write every statement like the above as
>>> something like:
>>>
>>> mixin {
>>> ProcessMatrixExpr!( "A = (I - B) * C + D;" );
>>> }
>>>
>>> How do you get from this mixin/string processing stuff to an API that
>>> I might actually be willing to use for common operations?
>>
>> That's a great starting point for a good conversation. And no, you
>> wouldn't do it with strings, you'd do it exactly as I wrote, with
>> regular code. You could use strings if you want to use more math-looking
>> operators and Greek symbols etc.
>>
>>>> I think there is a lot of apprehension and misunderstanding
>>>> surrounding what metacode is able and supposed to do or simplify.
>>>> Please, let's focus on understanding _before_ forming an opinion.
>>>
>>> I think that's exactly what Tom's getting at. He's asking for
>>> examples of how this would make like better for him and others. I
>>> think given your background you take it for granted that
>>> metaprogramming is the future. But D is attracts folks from all
>>> kinds of walks of life, because it promises to be a kinder, gentler
>>> C++. So some people here aren't even sure why D needs templates at
>>> all. Fortran doesn't have 'em after all. And Java just barely does.
>>
>> Then such people would be hard-pressed to figure why tuples (which are
>> essentially typelists, the single most used component in Loki) have
>> engendered, according to Walter himself, a surge in language popularity.
>>
>> I don't even think metaprogramming is the future. I think that it's one
>> of many helpful tools for writing good libraries, as are GC, modules, or
>> interfaces. *Everybody* appreciates good libraries.
>>
>>> Anyway, I think it would help get everyone on board if some specific
>>> and useful examples were given of how this solves real problems (and
>>> no I don't really consider black and white holes as solving real
>>> problems. I couldn't even find any non-astronomical uses of the
>>> terms in a google search.)
>>
>> The terms are recent and introduced by the Perl community. I happened to
>> like the metaphor. The consecrated name is "null object pattern".
>>
>>> For instance, it would be nice to see some more concrete discussion
>>> about
>>> * the "rails" case.
>>> * the X = A*B + C matrix/vector expressions case.
>>> * the case of generating bindings to scripting langauges / ORB stubs
>>> * the Spirit/parser generator case
>>>
>>> So here's my take on vector expressions since that's the only one I
>>> know anything about.
>>>
>>> *Problem statement*:
>>> Make the expression A=(I-B)*C+D efficient, where the variables are
>>> large vectors (I'll leave out matrices for now).
>>>
>>> *Why it's hard*:
>>> The difficulty is that (ignoring SSE instruction etc) the most
>>> efficient way to compute that is do all operations component-wise.
>>> So instead of computing I-B then multiplying by C, you compute
>>> A[i] = (I[i]-B[i])*C[i]+D[i];
>>> for each i. This eliminates the need to allocate large intermediate
>>> vectors.
>>
>> There's much more than that. (First off, things are more complicated in
>> the case of matrices.) You want to be gentle on cache, so when you have
>> any column-wise iteration you want to do matrix blocking. Depending on
>> the expression, you want to select different block sizes.
>>
>> Then you also want to do optionally partial unrolling *on top of
>> blocking*. Again, the amount of unrolling might depend on the
>> expression. (You don't want to unroll large expressions.) At this point
>> things are utterly messy.
>>
>> Iterating may also be better row-wise or column-wise, depending on the
>> expression. Some subexpressions have a "preferential" row-wise scanning
>> order and a "possible" column-wise order. Others must do things one way
>> or the other. For all this the appropriate code must be generated.
>>
>>> *Existing solutions*:
>>> Expression templates in C++, e.g. The Blitz++ library. Instead of
>>> making opSub in I-B return a new Vector object, you make opSub return
>>> an ExpressionTemplate object. This is a little template struct that
>>> contains a reference to I and to B, and knows how to subtract the two
>>> in a component-wise manner. The types of I and B are template
>>> parameters, LeftT and RightT. Its interface also allows it to be
>>> treated just like a Vector. You can add a vector to it, subtract a
>>> Vector from it etc.
>>>
>>> Now we go and try to multiply that result times C. The result of
>>> that is a new MultExpressionTemplate with two paramters, the LeftT
>>> being our previous SubExpressionTemplate!(Vector,Vector) and the
>>> RightT being Vector.
>>>
>>> Proceeding on in this way eventually the result of the math is of type:
>>>
>>> AddExpressionTemplate!(
>>> MultExpressionTemplate!(
>>> SubExpressionTemplate!(Vector,Vector),
>>> Vector),
>>> Vector)
>>>
>>> And you can see that we basically have a parse tree expressed as
>>> nested templates. The final trick is that a Vector.opAssign that
>>> takes an ExpressionTemplate is provided and that method calls a
>>> method of the expression template to finally trigger the calculation,
>>> like expr.eval(this). eval() has the top-level loop over the
>>> components of the answer.
>>>
>>> *Why Existing solutions are insufficient*
>>> For that all that effort to actually be useful the compiler has to be
>>> pretty agressive about inlining everything so that in the end all the
>>> temporary template structs and function calls go away and you're just
>>> left with one eval call. It can be tricky to get the code into just
>>> the right configuration so that the compiler will do the inlining.
>>> And even then results will depend on the quality of the compiler. My
>>> attempts using MSVC++ several years ago always came out to be slower
>>> than the naive version, but with about 10x the amount of code. The
>>> code is also pretty difficult to follow because of all those
>>> different types of expression templates that have to be created.
>>>
>>> If you include matrices things are even trickier because there are
>>> special cases like "A*B+a*C" that can be computed efficiently by a
>>> single optimized routine. You'd like to recognize such cases and
>>> turn them into single calls to that fast routine. You might also
>>> want to recognize "A*B+C" as a special case of that with a==1.
>>>
>>> *How it could be improved*:
>>> ??? this is what I'd like to see explained better.
>>
>> Great. One problem is that building ET libraries is exceedingly hard
>> because of the multiple issues that must be solved simultaneously. To
>> get a glimpse into the kind of problems that we are talking about, see
>> e.g. http://www.adtmag.com/joop/carticle.aspx?ID=627. Libraries cannot
>> afford to implement an optimal solution, partly because they have so
>> much language-y mud to go through, and partly because the C++ language
>> simply does not offer the code generation abilities that are needed.
>>
>> I haven't sat down to design a linear algebra using D's new abilities,
>> but they are definitely helpful in that specifying an elementary matrix
>> operation (the core of a loop) only needs one template. For example, if
>> you want to specify the kernel of a matrix addition, you can say it as a
>> type:
>>
>> MatrixOp!("lhs[i, j] + rhs[i, j]", rowwise, columnwise)
>>
>> The type specifies the kernel of the operation, conventionally expressed
>> in terms of lhs, rhs, i, and j, then the preferred iteration order, and
>> finally the alternate order. The MatrixOp template defines a member
>> function get(uint i, uint j) that expands the string given in the
>>
>> The kernel of multiplication is:
>>
>> MatrixOp!("sum!(k, 0, n, lhs[i, k] * rhs[k, j])", rowwise, columnwise,
>> blocking!(8), unrolling!(8))
>>
>> (sum is not doable in today's D because D lacks the ability to inject a
>> new alias name into a template upon invocation.)
>>
>> This type specifies how an element of the multiplication is obtained,
>> again what preferred and alternate method of iteration is possible, and
>> also specifies blocking and unrolling suggestions.
>>
>> In a complex expression, lhs and rhs will bind to subexpressions; it's
>> easy to then generate the appropriate code upon assignment to the
>> target, with heuristically chosen blocking and unrolling parameters
>> based on the suggestions made by the subexpressions. All this is doable
>> without difficulty through manipulating and generating code during
>> compilation.
>>
>>
>> Andrei
>
> I've also done some experimentation on this problem.
> To do a perfect solution, you'd need to be able to identify when a
> variable is used twice, for example,
>
> a = 3.0 * b + (c - a * 1.5);
> The compiler knows that 'a' occurs twice, but that information is not
> transferred into expression templates.
I think what you really need is aliasing information (e.g. two names
referring to the same vector), and that is typically not easily computable.
> The compiler may also know the
> length of some of the arrays, and that is also lost.
Ah, indeed. So that would mean better error checking and probably less
runtime bounds checking.
> I've almost completed a BLAS1 generator for x87 (not SSE), which spits
> out _optimal_ asm code for any combination of vector-scalar operations
> (vec+vec, vec-vec, vec*real, dot(vec, vec), for any mixture of 32-,64-,
> and 80- bit vectors. The cache effects are simple in this case.
> The code to do this is amazingly compact, thanks to tuples and the new
> 1.005 mixins (in fact, it is a small fraction of the size of an asm
> BLAS1 kernal).
> I noticed that the 'expression template' part of the code contained
> almost nothing specific to vectors; I think there's potential for a
> library mixin component to do it.
>
> Of course another approach would be to add
> opAssignExpression(char [] expr)()
> {
> }
> where "expr" is the verbatim expression.
> a = b*5 + c*d;
>
> would become
> mixin a.opAssignExpression!("a=b*5+c*d");
>
> since it seems that with expression templates, you're fighting the
> compiler, attempting to undo everything it has done.
Not everything - e.g., precedence of operators remains unchanged. In the
string-based approach you'd have to write a little parser to reimplement
operator precedences. But point taken.
Andrei
More information about the Digitalmars-d
mailing list