Lexer and parser generators using CTFE
philippe.sigaud at gmail.com
Wed Feb 29 01:45:20 PST 2012
>> > On Wednesday, February 29, 2012 02:16:12 Christopher Bergqvist wrote:
>> > > I agree that the current direction of D in this area is impressive.
>> > > However, I fail to see a killer-feature in generating a lexer-parser
>> > > generator at compile-time instead of run-time.
>> > >
> CTFE parsing is especially useful for DSEL (Domain Specific Embedded
Languages) or internal DSLs. The advantages are:
> 1. Syntactic errors (in the parsed constructs) are given out at compile
> 2. D reflections are available only at compile time. Referencing the
variables/identifiers in the parsed subset of DSL with the mainstream D
code is impossible without reflections in place.
One of my goals while writing a CT grammar generator was to get a
compile-time parse-tree. Since it contains strings, it's easy to walk the
tree, assembling strings as you go and generating the code you want
(if//when you want to write code, that is)
Strings are a D way to represent code, so any way to get structured strings
at compile-time opens whole vistas for code generation.
As for semantic actions, I added them in my code yesterday. I had hopes for
using D's new anonymous syntax (p => p), but by being anonymous, they
cannot be inserted easily in string mixins (other modules do not now about
__lambda1 and co).
Anyway, I now have semantic actions at compile-time, I used them to write a
small (as in, veeery simple) XML parser: I use semantic actions to push
node names while encountering them and pop the last tag while encountering
a closing tag. It seems to work OK.
That looks a bit like this (sorry, writing on a pad)
mixin(Grammar!("Doc <- Node*"
"Node <- OpeningTag (Text / Node)* ClosingTag", NodeAction,
"OpeningTag <- '<' Identifier '>'", OpeningAction,
"ClosingTag <- `</` Identifier '>'", ClosingAction,
"Text <- (!(OpeningTag / ClosingTag) _)+"));
The PEG for Text just means: any char, as long as it's not an OpeningTag
nor a ClosingTag. PEG use '.' to say 'Any char', but I wanted to be able to
deal with qualified names, so I chose '_' instead.
When there is no action, it default to NoOp, as is the case for Doc, Node
I also added named captures (and capture comparison), to be able to say: "I
want a sequence of equal chars":
"Equal <- _ at first (_=first)*"
That is: any char, store it as "first", then take any number of char, long
as their match is equal to first's match.
All this work at CT.
I'm afraid being in holidays right now means I do not have easy access to
GitHub (no git on a pad, and the computer I use to code right now does not
have any network connection). I'll put all this online in a few days,
because that must seems like the ramblings of a madman right now...
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Digitalmars-d