Let's stop parser Hell
Roman D. Boiko
rb at d-coding.com
Sat Jul 7 15:32:47 PDT 2012
On Saturday, 7 July 2012 at 22:05:05 UTC, Dmitry Olshansky wrote:
> That would not always work, but yes that's what we should do
> IMHO.
I didn't do a deeper research on this, just shared my current
understanding. When that doesn't work in a particular case, we
will need to find the problem and solve that.
> It's not what packrats do however (when I say algorithm I mean
> specifically packrat). And PEGs are commonly associated with
> packrats.
As Philippe Sigaud admitted, that is an incorrect association.
>>> it's obvious how backtracking comes in ... whan say A fails
>>> somewhere in the middle of it.
>> Simply stop tracking respective state. Process others in
>> parallel as I described above.
>>
> Yeah it could be done, but it doesn't buy you a thing in a lot
> of cases unlike in regular grammar. The problem is that state
> is not as simple as in regex for instance (even in regex that
> must capture some submatches DFA won't do BTW). Thus you can't
> merge seemingly identical states because they depend on
> (different) interpretation of previous part of input.
Well, if you must track the path to a particular state, then DFA
simply doesn't match the problem and there's nothing we can do.
> That's why DFA in ANTLR is only used to do lookahead or lexing,
> it doesn't maintain path through states during parsing.
See above. Is maintaining path really necessary? I thought that
DFA with additional states could be created for any particular
situation of ambiguity, etc. This needs further research.
>>> Of course there is a fair amount of bookkeeping that reduces
>>> doing
>>> work all over again but it does ... allocate.
>>
>> The amount of used memory would be proportional to the length
>> of input
>> after the branching point times number of rules (actually, of
>> states in
>> the DFA, which is likely larger but still finite for a
>> particular grammar).
>>
> Finite and proportional to input size? Another way to say why
> an O(n) memory requirement for packrats?
Sorry, I incorrectly included the O(n) multiplier. It should be
finite. When you go to the next character, you don't need
previous states any more.
> Yup. But I'm not talking definitions here I'm talking the
> notion of "integrated lexing" and lexing is certainly about
> characters or was it?
I thought integrated lexing was about doing it along with parsing
and specifying its rules directly inside the grammar. From the
former we get structural context to narrow-down available
options, and from the latter we have a uniform grammar
description.
More information about the Digitalmars-d
mailing list