Let's stop parser Hell

Roman D. Boiko rb at d-coding.com
Sat Jul 7 09:06:27 PDT 2012


On Saturday, 7 July 2012 at 15:42:30 UTC, Chad J wrote:
> On 07/05/2012 08:28 AM, Roman D. Boiko wrote:
>> Pegged may eventually become standard, if it will be 
>> performanceoptimized and a bit more customizable
>
> Interesting, I thought you were hand-writing this stuff.
>
> I'm a fan of pegged and made some pull requests, one of which 
> was aimed at making it more customizable by allowing the user 
> to define what type gets used as a parse tree node, thus 
> allowing one to potentially use their parse tree as an AST (and 
> maybe do a couple other things too). It's a WIP, but the proof 
> of concept is done: DMD can handle the extra templating at 
> compile time, so it works.
>
> What kind of things did you want in terms of customizability?

There are many possible customization point which would make it a 
better fit for my project while also being useful in general.

The most critical was the one you implemented: ability to define 
a custom parse tree. I also needed the ability to use immutable 
structures (almost) everywhere, while currently they must be 
mutable. Also it would be great to avoid UTF conversions (instead 
use functions and types templated by the string type). I also 
wanted to add ability to reuse parts of previously generated AST 
which correspond to source code that has not been changed, or to 
identical source code parsed previously. (This would allow 
incremental parsing, e.g., while user is typing.) The latter 
would require to track source code positions separately, or even 
generating them on demand (this is the feature actively 
criticized by deadalnix in some previous discussions but central 
to DCT architecture). AST would only hold information about 
widths of its nodes.

I've also written some notes (10-15 suggestions) while studying 
Pegged code, which will be shared later.

However, as I found a few hours ago, Packrat parsing (typically 
used to handle PEG) has serious disadvantages: it complicates 
debugging because of frequent backtracking, it has problems with 
error recovery, and typically disallows to add actions with side 
effects (because of possibility of backtracking). These are 
important enough to reconsider my plans of using Pegged. I will 
try to analyze whether the issues are so fundamental that I (or 
somebody else) will have to create an ANTLR-like parser instead, 
or whether it is possible to introduce changes into Pegged that 
would fix these problems.


More information about the Digitalmars-d mailing list