Let's stop parser Hell
Roman D. Boiko
rb at d-coding.com
Sat Jul 7 03:24:24 PDT 2012
On Saturday, 7 July 2012 at 09:06:57 UTC, Roman D. Boiko wrote:
> http://stackoverflow.com/questions/11373644/performance-of-parsers-peg-vs-lalr1-or-llk
So far it looks like LALR parsers may have lower constant factors
than Packrat.
The difference could be minimized by paying attention to parsing
of terminal symbols, which was in my plans already. It is not
necessary to strictly follow Packrat parsing algorithm.
The benefits of Pegged, in my view, are its support of Parsing
Expression Grammar (PEG) and compile-time evaluation. It is
easily extensible and modifiable.
When I implemented recursive-descent parser by hand in one of
early drafts of DCT, I strongly felt the need to generalize code
in a way which in retrospect I would call PEG-like. The structure
of my hand-written recursive-descent parser was a one-to-one
mapping to an implemented subset of D specification, and I
considered it problematic, because it was needed to duplicate the
same structure in the resulting AST.
PEG is basically a language that describes both, the
implementation of parser, and the language syntax. It greatly
reduces implicit code duplication.
I think that generated code can be made almost as fast as a
hand-written parser for a particular language (probably, a few
percent slower). Especially if that language is similar to D
(context-free, with fine-grained hierarchical grammar).
Optimizations might require to forget about strictly following
any theoretical algorithm, but that should be OK.
More information about the Digitalmars-d
mailing list