DeRailed DSL (was Re: compile-time regex redux)

Mon Feb 12 00:32:35 PST 2007

Don Clugston wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
>> Don Clugston wrote:
>>> To do a perfect solution, you'd need to be able to identify when a 
>>> variable is used twice, for example,
>>>
>>> a = 3.0 * b + (c - a * 1.5);
>>> The compiler knows that 'a' occurs twice, but that information is not 
>>> transferred into expression templates.
>>
>> I think what you really need is aliasing information (e.g. two names 
>> referring to the same vector), and that is typically not easily 
>> computable.
> 
> Ideally, yes. But there's value in knowing names which are *definitely* 
> aliased, even if the majority of names have indeterminate aliasing. I 
> think that most of the value comes from the simple cases, the ultimate 
> example being the += operator.
> 
> Example: for x87, the most severe limitation on expression complexity is 
> not the number of temporaries, but the number of distinct vectors (you 
> run out of pointer registers before you run out of floating-point stack).
> 
>>> The compiler may also know the length of some of the arrays, and that 
>>> is also lost.
>>
>> Ah, indeed. So that would mean better error checking and probably less 
>> runtime bounds checking.
> 
> Exactly. And there's implications for loop unrolling, too. (eg, loop 
> unrolling is very attractive if you know the length is a multiple of 4).
> 
>>> Of course another approach would be to add
>>> opAssignExpression(char [] expr)()
>>> {
>>> }
>>> where "expr" is the verbatim expression.
>>> a = b*5 + c*d;
>>>
>>> would become
>>> mixin a.opAssignExpression!("a=b*5+c*d");
>>>
>>> since it seems that with expression templates, you're fighting the 
>>> compiler, attempting to undo everything it has done.
>>
>> Not everything - e.g., precedence of operators remains unchanged. In 
>> the string-based approach you'd have to write a little parser to 
>> reimplement operator precedences. But point taken.
> 
> I've thought about this a bit more. The compiler has done some useful 
> constant folding for you, and parsed all the literals. So ideally you 
> want it in a partially digested form.
> My implementation converts the expression into a tuple of values 
> (basically a stack), and a char [] of postfix operations to be performed 
> on that stack. Something like this is probably close to ideal, 
> especially if the compiler could remove known duplicate (aliased) values 
> from the tuple.
> 
> There would need to be a canonical form for the operations, though -- 
> postfix is great for Forth, but looks thoroughly out of place in D.
> eg
> a = 3.0 * b + (c - a * 1.5);
> becomes
> "v2 v3 * v4 v5 v1 * - v1 ="
> with the tuple being v[1] = a, v[2] = 3.0, v[3] = b, v[4] = c, v[5] = 1.5
> But perhaps a
> char [] ConvertToPostfix(char [] expr) library metafunction is all 
> that's required.

Or go with prefix notation, and use some sort of nesting indicator, like 
idunno, uh, parentheses.  ;-)

   char [] ConvertToLisp(char [] dexpr)

Seriously though, it may be that that's pretty much what we're going to 
need in the end.  Greenspun's tenth law and all.  If you want to 
manipulate a textual form of parse trees in the most convenient way 
possible you're pretty much going to end up with Lisp Sexprs.   You'll 
just want to get out of Lisp in the end to spit out the D code:

    char [] ConvertLispToD(char [] sexpr)

I could imagine worse things happening.  D becomes a better C++ with 
embedded compile-time Lisp for extensions.

--bb