Lexer and parser generators using CTFE

Tue Feb 28 06:58:08 PST 2012

(2012/02/28 16:59), Andrei Alexandrescu wrote:
> I'm starting a new thread on this because I think the matter is of
> strategic importance.
>
> We all felt for a long time that there's a lot of potential in CTFE, and
> potential applications have been discussed more than a few times,
> ranging from formatting strings parsed to DSLs and parser generators.
>
> Such feats are now approaching fruition because a number of factors
> converge:
>
> * Dmitry Olshansky's regex library (now in Phobos) generates efficient D
> code straight from regexen.
>
> * The scope and quality of CTFE has improved enormously, making more
> advanced uses possible and even relatively easy (thanks Don!)
>
> * Hisayuki Mima implemented a parser generator in only 3000 lines of
> code (sadly, no comments or documentation yet :o))
>
> * With the occasion of that announcement we also find out Philippe
> Sigaud has already a competing design and implementation of a parser
> generator.
>
> This is the kind of stuff I've had an eye on for the longest time. I'm
> saying it's of strategic importance because CTFE technology, though not
> new and already available with some languages, has unique powers when
> combined with other features of D. With CTFE we get to do things that
> are quite literally impossible to do in other languages.
>
> We need to have a easy-to-use, complete, seamless, and efficient
> lexer-parser generator combo in Phobos, pronto. The lexer itself could
> use a character-level PEG or a classic automaton, and emit tokens for
> consumption by a parser generator. The two should work in perfect tandem
> (no need for glue code). At the end of the day, defining a complete
> lexer+parser combo for a language should be just a few lines longer than
> the textual representation of the grammar itself.
>
> What do you all think? Let's get this project off the ground!
>
>
> Thanks,
>
> Andrei

Hello Andrei,

Certainly, I don't write yet the documentation of my library, ctpg.
(But a few examples available here: 
https://github.com/youkei/ctpg/tree/master/example)
So, I'd like to describe it here.

To be honest, ctpg is inspired by one module of Scala's standard 
library, Parser Combinators.
One of the differences between Parser Combinators and ctpg is that 
Parser Combinators generates parsers in run-time, but ctpg generates 
parsers in compile-time by the power of CTFE and mixin.

A parser generated by ctpg is a recursive descent parser, so it does 
lexical analysis and parsing at a time. And the type of input which it 
can accept is string, wstring, dstring and ForwardRange whose element 
type is char, wchar or dchar.
So, dividing lexical analysis and parsing into two processes is 
difficult in ctpg.

Importantly, a parser generated by ctpg is statically typed as one of 
the examples, "parsing simple arithmetic expression" shows.
Generally a parser generated by other tool and accepting tokens returns 
the abstract syntax tree, but it return the evaluated value in the example.
In other words, it does lexical analysis, parsing and (type) converting 
at a time.
If you want simply abstract syntax tree, it may be a little pain to use 
ctpg.

That's all I'd like to say about ctpg here.

By the way, I wholeheartedly agree with Andrei's opinion. But currently, 
I think, CTFE is limited because of this issue: 
http://d.puremagic.com/issues/show_bug.cgi?id=6498 .
Without dealing with this issue, We couldn't bring out the full 
potential of CTFE.

Thanks,
Hisayuki Mima