Let's stop parser Hell

Sat Jul 7 03:24:24 PDT 2012

On Saturday, 7 July 2012 at 09:06:57 UTC, Roman D. Boiko wrote:
> http://stackoverflow.com/questions/11373644/performance-of-parsers-peg-vs-lalr1-or-llk

So far it looks like LALR parsers may have lower constant factors 
than Packrat.

The difference could be minimized by paying attention to parsing 
of terminal symbols, which was in my plans already. It is not 
necessary to strictly follow Packrat parsing algorithm.

The benefits of Pegged, in my view, are its support of Parsing 
Expression Grammar (PEG) and compile-time evaluation. It is 
easily extensible and modifiable.

When I implemented recursive-descent parser by hand in one of 
early drafts of DCT, I strongly felt the need to generalize code 
in a way which in retrospect I would call PEG-like. The structure 
of my hand-written recursive-descent parser was a one-to-one 
mapping to an implemented subset of D specification, and I 
considered it problematic, because it was needed to duplicate the 
same structure in the resulting AST.

PEG is basically a language that describes both, the 
implementation of parser, and the language syntax. It greatly 
reduces implicit code duplication.

I think that generated code can be made almost as fast as a 
hand-written parser for a particular language (probably, a few 
percent slower). Especially if that language is similar to D 
(context-free, with fine-grained hierarchical grammar). 
Optimizations might require to forget about strictly following 
any theoretical algorithm, but that should be OK.