Let's stop parser Hell
Roman D. Boiko
rb at d-coding.com
Sat Jul 7 09:06:27 PDT 2012
On Saturday, 7 July 2012 at 15:42:30 UTC, Chad J wrote:
> On 07/05/2012 08:28 AM, Roman D. Boiko wrote:
>> Pegged may eventually become standard, if it will be
>> performanceoptimized and a bit more customizable
>
> Interesting, I thought you were hand-writing this stuff.
>
> I'm a fan of pegged and made some pull requests, one of which
> was aimed at making it more customizable by allowing the user
> to define what type gets used as a parse tree node, thus
> allowing one to potentially use their parse tree as an AST (and
> maybe do a couple other things too). It's a WIP, but the proof
> of concept is done: DMD can handle the extra templating at
> compile time, so it works.
>
> What kind of things did you want in terms of customizability?
There are many possible customization point which would make it a
better fit for my project while also being useful in general.
The most critical was the one you implemented: ability to define
a custom parse tree. I also needed the ability to use immutable
structures (almost) everywhere, while currently they must be
mutable. Also it would be great to avoid UTF conversions (instead
use functions and types templated by the string type). I also
wanted to add ability to reuse parts of previously generated AST
which correspond to source code that has not been changed, or to
identical source code parsed previously. (This would allow
incremental parsing, e.g., while user is typing.) The latter
would require to track source code positions separately, or even
generating them on demand (this is the feature actively
criticized by deadalnix in some previous discussions but central
to DCT architecture). AST would only hold information about
widths of its nodes.
I've also written some notes (10-15 suggestions) while studying
Pegged code, which will be shared later.
However, as I found a few hours ago, Packrat parsing (typically
used to handle PEG) has serious disadvantages: it complicates
debugging because of frequent backtracking, it has problems with
error recovery, and typically disallows to add actions with side
effects (because of possibility of backtracking). These are
important enough to reconsider my plans of using Pegged. I will
try to analyze whether the issues are so fundamental that I (or
somebody else) will have to create an ANTLR-like parser instead,
or whether it is possible to introduce changes into Pegged that
would fix these problems.
More information about the Digitalmars-d
mailing list