Lexer and parser generators using CTFE

Tue Feb 28 17:35:44 PST 2012

On 29 February 2012 14:16, Christopher Bergqvist
<spambox0 at digitalpoetry.se> wrote:
> On Tuesday, 28 February 2012 at 08:36:13 UTC, CTFE-4-the-win wrote:
>>
>> On Tuesday, 28 February 2012 at 07:59:16 UTC, Andrei Alexandrescu
>> wrote:
>>>
>>> I'm starting a new thread on this because I think the matter is of
>>> strategic importance.
>>>
>>> We all felt for a long time that there's a lot of potential in CTFE, and
>>> potential applications have been discussed more than a few times, ranging
>>> from formatting strings parsed to DSLs and parser generators.
>>>
>>> Such feats are now approaching fruition because a number of factors
>>> converge:
>>>
>>> * Dmitry Olshansky's regex library (now in Phobos) generates efficient D
>>> code straight from regexen.
>>>
>>> * The scope and quality of CTFE has improved enormously, making more
>>> advanced uses possible and even relatively easy (thanks Don!)
>>>
>>> * Hisayuki Mima implemented a parser generator in only 3000 lines of code
>>> (sadly, no comments or documentation yet :o))
>>>
>>> * With the occasion of that announcement we also find out Philippe Sigaud
>>> has already a competing design and implementation of a parser generator.
>>>
>>> This is the kind of stuff I've had an eye on for the longest time. I'm
>>> saying it's of strategic importance because CTFE technology, though not new
>>> and already available with some languages, has unique powers when combined
>>> with other features of D. With CTFE we get to do things that are quite
>>> literally impossible to do in other languages.
>>>
>>> We need to have a easy-to-use, complete, seamless, and efficient
>>> lexer-parser generator combo in Phobos, pronto. The lexer itself could use a
>>> character-level PEG or a classic automaton, and emit tokens for consumption
>>> by a parser generator. The two should work in perfect tandem (no need for
>>> glue code). At the end of the day, defining a complete lexer+parser combo
>>> for a language should be just a few lines longer than the textual
>>> representation of the grammar itself.
>>>
>>> What do you all think? Let's get this project off the ground!
>>>
>>>
>>> Thanks,
>>>
>>> Andrei
>>
>>
>> Definitely, I applaud this initiative! I've long been of the
>> opinion that CTFE parsing is D's killer-feature, which would
>> allow me to "sneak" D into a "nameless above average size
>> company". ;)
>
>
> I agree that the current direction of D in this area is impressive.
>  However, I fail to see a killer-feature in generating a lexer-parser
> generator at compile-time instead of run-time.
>
> A run-time generator would benefit from not having to execute within the
> limited CTFE environment and would always be on-par in that respect.  A
> compile time generator would internalize the generation and compilation of
> the result (with possible glue-code), simplifying the build process
> somewhat.
>
> What am I failing to pick up on?

As far as I can tell, the advantage of compile-time parser-generation
is that you get significantly better start-up times for the program.
Generating a lexer/parser at runtime produces signficant overhead.
Also, if you want to do a yacc-style action system, then it being
created at compile-time means that you don't have the overhead of
using anonymous functions or delegates at runtime, you can just give
it a raw string of D code.

The CTFE environment is not that limited, and has enough functionality
to produce a lexer/parser.

Complex language definitions would be benefited by CTFE
parser-generation since the load times to create the generator can be
quite long, a problem for programs that may not be long-running (a
lint tool for example).

--
James Miller