DeRailed DSL (was Re: compile-time regex redux)

Andrei Alexandrescu (See Website For Email) SeeWebsiteForEmail at erdani.org
Fri Feb 9 16:02:40 PST 2007


Tom S wrote:
> When the compiler is used for the processing of a DSL, it simply masks a 
> simple step that an external tool would do. It's not much of a problem 
> to run an external script or program, that will read the DSL and output 
> D code, while perhaps also doing other stuff, connecting with databases 
> or making coffee.

This is a misrepresentation. Code generation with external tools has 
been done forever; it is never easy (unless all you need is a table of 
logarithms), and it always incurs a non-amortized cost of parsing the 
DSL _plus_ the host language. Look at lex and yacc, the prototypical 
examples. They aren't small or simple nor perfectly integrated with the 
host language. And their DSL is extremely well understood. That's why 
there's no proliferation of lex&yacc-like tools for other DSLs (I seem 
to recall there was an embedded SQL that got lost in the noise) simply 
because the code generator would basically have to rewrite a significant 
part of the compiler to do anything interesting.

Even lex and yacc are often dropped in favor of Xpressive and Spirit, 
which, for all their odd syntax, are 100% integrated with the host 
language, which allows writing fully expressive code without fear that 
the tool won't understand this or won't recognize that. People have gone 
at amazing lengths to stay within the language, and guess why - because 
within the language you're immersed in the environment that your DSL 
lives in.

Reducing all the issue to the mythical external code generator that does 
it all and make coffee is simplistic. Proxy/stub generators for remote 
procedure calls were always an absolute pain to deal with; now compilers 
do it automatically, because they can. Understanding that that door can, 
and should, be opened to the programmer is an essential step in 
appreciating the power of metacode.

> But when this is moved to the compiler, security 
> problems arise, code becomes more cryptic and suddenly, the D code 
> generated from the DSL cannot be simply accessed. It's simply produced 
> by the 'compiler extension' and given further into compilation. A 
> standalone tool will produce a .d module, which can be further verified, 
> processed by other tools - such as one that generates reflection data - 
> and when something breaks, one can step into the generated source, 
> review it and easier spot errors in it. Sean also mentioned that the DSL 
> processor will probably need some diagnostic output, and simple 
> pragma(msg) and static assert simply won't cut it.

I don't see this as a strong argument. Tools can get better, no question 
about that. But their current defects should be estimated only having 
the potential power in mind. Heck, nobody would have bought the first 
lightbulb or the first automobile. They sucked.

> Therefore, I'd like to see a case when a compile-time DSL processing is 
> going to be really useful in the real world, as to provoke further 
> complication of the compiler and its usage patterns.

I can't parse this sentence. Did you mean "as opposed to provoking" 
instead of "as to provoke"?

> The other aspect I observed in the discussions following the new dmd 
> release, is that folks are suggesting writing full language parsers, D 
> preprocessors and various sort of operations on complex languages, 
> notably extended-D parsing... This may sound weird in my lips, but 
> that's clearly abuse. What these people really want is a way to extend 
> the language's syntax, pretty much as Nemerle does it. And frankly, if D 
> is going to be a more powerful language, built-in text preprocessing 
> won't cut it. Full fledged macro support and syntax extension mechanics 
> are something that we should look at.

This is a misunderstanding. The syntax is not to be extended. It stays 
fixed, and that is arguably a good thing. The semantics become more 
flexible. For example, they will make it easy to write a matrix operation:

A = (I - B) * C + D

and generate highly performant code from it. (There are many reasons for 
which that's way harder than it looks.)

I think there is a lot of apprehension and misunderstanding surrounding 
what metacode is able and supposed to do or simplify. Please, let's 
focus on understanding _before_ forming an opinion.


Andrei



More information about the Digitalmars-d mailing list