Lexer and parser generators using CTFE

Tue Feb 28 10:46:04 PST 2012

On Tue, 28 Feb 2012 08:59:21 +0100, Andrei Alexandrescu  
<SeeWebsiteForEmail at erdani.org> wrote:

> I'm starting a new thread on this because I think the matter is of  
> strategic importance.
>
> We all felt for a long time that there's a lot of potential in CTFE, and  
> potential applications have been discussed more than a few times,  
> ranging from formatting strings parsed to DSLs and parser generators.
>
> Such feats are now approaching fruition because a number of factors  
> converge:
>
> * Dmitry Olshansky's regex library (now in Phobos) generates efficient D  
> code straight from regexen.
>
> * The scope and quality of CTFE has improved enormously, making more  
> advanced uses possible and even relatively easy (thanks Don!)
>
> * Hisayuki Mima implemented a parser generator in only 3000 lines of  
> code (sadly, no comments or documentation yet :o))
>
> * With the occasion of that announcement we also find out Philippe  
> Sigaud has already a competing design and implementation of a parser  
> generator.
>
> This is the kind of stuff I've had an eye on for the longest time. I'm  
> saying it's of strategic importance because CTFE technology, though not  
> new and already available with some languages, has unique powers when  
> combined with other features of D. With CTFE we get to do things that  
> are quite literally impossible to do in other languages.
>
> We need to have a easy-to-use, complete, seamless, and efficient  
> lexer-parser generator combo in Phobos, pronto. The lexer itself could  
> use a character-level PEG or a classic automaton, and emit tokens for  
> consumption by a parser generator. The two should work in perfect tandem  
> (no need for glue code). At the end of the day, defining a complete  
> lexer+parser combo for a language should be just a few lines longer than  
> the textual representation of the grammar itself.
>
> What do you all think? Let's get this project off the ground!
>
>
> Thanks,
>
> Andrei

I wrote a generic lexer generator some time ago.
It already let to some compiler O(N^2) optimizations, because the token
declarations sneak into the mangling :(.
I also finally added a workaround for a remaining CTFE bug (#6815).

https://gist.github.com/1255439 - lexer generator
https://gist.github.com/1262321 - complete and fast D lexer

I've ditched an attempt to write a parser combinator. It was based on
expression templates and ended up at spirit craziness.

A lot becomes feasible from the CTFE perspective,
despite some bugfixes I only miss exp and log currently.

I do not agree that it's the right moment to write a parser though.
It hits the first of phobos two biggest shortcomings, the lack of a good  
I/O
system and the missing Allocators.
Any parser written now will either risk to not play nice with ranges
or has to come up with it's own buffering again.