Let's stop parser Hell

Tue Jul 10 03:41:12 PDT 2012

On Saturday, 7 July 2012 at 16:37:56 UTC, Roman D. Boiko wrote:
>> Note that PEG does not impose to use packrat parsing, even 
>> though it was developed to use it. I think it's a historical 
>> 'accident' that put the two together: Bryan Ford thesis used 
>> the two together.
>>
>> Note that many PEG parsers do not rely on packrat (Pegged does 
>> not).
>> There are a bunch of articles on Bryan Ford's website by a guy
>> writting a PEG parser for Java, and who found that storing the 
>> last rules was enought to get a slight speed improvement, buth 
>> that doing anymore sotrage was detrimental to the parser's 
>> overall efficiency.
>
> That's great! Anyway I want to understand the advantages and 
> limitations of both Pegged and ANTLR, and probably study some 
> more techniques. Such research consumes a lot of time but can 
> be done incrementally along with development.

One disadvantage of Packrat parsers I mentioned was problematic 
error recovery (according to the article from ANTLR website). 
After some additional research, I found that it is not a critical 
problem. To find the exact place of error (from parser's 
perspective, not user's) one only needs to remember the farthest 
successfully parsed position (among several backtracking 
attempts) and the reason that it failed.

It is also possible to rerun parsing with some additional 
heuristics after failing, thus enabling advanced error repair 
scenarios.

Since Pegged doesn't use Packrat algorithm, this solution might 
be either not relevant or not applicable, but I doubt that there 
will be any fundamental problem with error recovery.

Unpleasant debugging experience, however, should be relevant for 
any parser that uses backtracking heavily.