Let's stop parser Hell

Philippe Sigaud philippe.sigaud at gmail.com
Sat Jul 7 09:26:40 PDT 2012


On Sat, Jul 7, 2012 at 6:06 PM, Roman D. Boiko <rb at d-coding.com> wrote:

> The most critical was the one you implemented: ability to define a custom
> parse tree. I also needed the ability to use immutable structures (almost)
> everywhere, while currently they must be mutable. Also it would be great to
> avoid UTF conversions (instead use functions and types templated by the
> string type).

I added dstrings because

1- at the time (a few months ago), the lists here were awash in UTF-32
discussions and I thought that'd be the way to go anyway
2- other D parsing libraries seemed to go to UTF32 also (CTPG)
3- I wanted to be able to parse mathematical notation like nabla,
derivatives, etc. which all have UTF32 symbols.


> However, as I found a few hours ago, Packrat parsing (typically used to
> handle PEG) has serious disadvantages: it complicates debugging because of
> frequent backtracking, it has problems with error recovery, and typically
> disallows to add actions with side effects (because of possibility of
> backtracking). These are important enough to reconsider my plans of using
> Pegged. I will try to analyze whether the issues are so fundamental that I
> (or somebody else) will have to create an ANTLR-like parser instead, or
> whether it is possible to introduce changes into Pegged that would fix these
> problems.

Note that PEG does not impose to use packrat parsing, even though it
was developed to use it. I think it's a historical 'accident' that put
the two together: Bryan Ford thesis used the two together.

Note that many PEG parsers do not rely on packrat (Pegged does not).
There are a bunch of articles on Bryan Ford's website by a guy
writting a PEG parser for Java, and who found that storing the last
rules was enought to get a slight speed improvement, buth that doing
anymore sotrage was detrimental to the parser's overall efficiency.


More information about the Digitalmars-d mailing list