Let's stop parser Hell

Sat Jul 7 15:32:47 PDT 2012

On Saturday, 7 July 2012 at 22:05:05 UTC, Dmitry Olshansky wrote:
> That would not always work, but yes that's what we should do 
> IMHO.
I didn't do a deeper research on this, just shared my current 
understanding. When that doesn't work in a particular case, we 
will need to find the problem and solve that.

> It's not what packrats do however (when I say algorithm I mean 
> specifically packrat). And PEGs are commonly associated with 
> packrats.
As Philippe Sigaud admitted, that is an incorrect association.

>>> it's obvious how backtracking comes in ... whan say A fails 
>>> somewhere in the middle of it.
>> Simply stop tracking respective state. Process others in 
>> parallel as I described above.
>>
> Yeah it could be done, but it doesn't buy you a thing in a lot 
> of cases unlike in regular grammar. The problem is that state 
> is not as simple as in regex for instance (even in regex that 
> must capture  some submatches DFA won't do BTW). Thus you can't 
> merge seemingly identical states because they depend on 
> (different) interpretation of previous part of input.
Well, if you must track the path to a particular state, then DFA 
simply doesn't match the problem and there's nothing we can do.

> That's why DFA in ANTLR is only used to do lookahead or lexing, 
> it doesn't maintain path through states during parsing.
See above. Is maintaining path really necessary? I thought that 
DFA with additional states could be created for any particular 
situation of ambiguity, etc. This needs further research.

>>> Of course there is a fair amount of bookkeeping that reduces 
>>> doing
>>> work all over again but it does ... allocate.
>>
>> The amount of used memory would be proportional to the length 
>> of input
>> after the branching point times number of rules (actually, of 
>> states in
>> the DFA, which is likely larger but still finite for a 
>> particular grammar).
>>
> Finite and proportional to input size? Another way to say why 
> an O(n) memory requirement for packrats?
Sorry, I incorrectly included the O(n) multiplier. It should be 
finite. When you go to the next character, you don't need 
previous states any more.

> Yup. But I'm not talking definitions here I'm talking the 
> notion of "integrated lexing" and lexing is certainly about 
> characters or was it?

I thought integrated lexing was about doing it along with parsing 
and specifying its rules directly inside the grammar. From the 
former we get structural context to narrow-down available 
options, and from the latter we have a uniform grammar 
description.