Let's stop parser Hell

Roman D. Boiko rb at d-coding.com
Sat Jul 7 14:49:51 PDT 2012


On Saturday, 7 July 2012 at 21:35:52 UTC, Dmitry Olshansky wrote:
> On 08-Jul-12 01:24, Roman D. Boiko wrote:
>> But PEG itself doesn't require backtracking parsing, does it?
>
> It does. Specifically the algorithm is to try option A, try 
> option B, try option C...
There is no algorithm, only __grammar__. It specifies that option 
A has higher priority than option B in expression A / B / C. But 
it doesn't forbid, for example, to analyze all in parallel (track 
state transitions) for each character consequently, as a proper 
DFA implementation would do, until some option succeeds and all 
previous fail. A greedy regex would also have to check all 
successor options (C in the exapmle above) to determine the 
longest one.

> it's obvious how backtracking comes in ... whan say A fails 
> somewhere in the middle of it.
Simply stop tracking respective state. Process others in parallel 
as I described above.

> Of course there is a fair amount of bookkeeping that reduces 
> doing work all over again but it does ... allocate.

The amount of used memory would be proportional to the length of 
input after the branching point times number of rules (actually, 
of states in the DFA, which is likely larger but still finite for 
a particular grammar).

>>> Tokens.. there is no such term in use if we talk about 'pure' 
>>> PEG.
>>
>> Terminal symbols.
>>
> Characters.
Nope. According to the definition, PEG is a set of non-terminal 
symbols, terminal symbols, production rules, and a starting 
non-terminal. Terminals are not necessarily characters.


More information about the Digitalmars-d mailing list