std.d.lexer: pre-voting review / discussion
deadalnix
deadalnix at gmail.com
Wed Sep 11 22:00:09 PDT 2013
On Thursday, 12 September 2013 at 03:37:55 UTC, H. S. Teoh wrote:
> I still don't understand why backtracking is necessary in the
> first
> place. I would've thought a modern parser should be well able
> to encode
> seen tokens in its state so that backtracking is never
> necessary. Or
> does D grammar have tricky bits that cannot be handled this
> way, that
> I'm unaware of?
>
The problem is that it can cause a exponential (and I literally
mean exponential here) amount of complexity.
In some cases, the complexity is manageable, but in other that
don't make any sense (it has to be noted that even full lexing
don't make any sens here). For instance :
int foo()() {}
^
When you are at the caret position, you don't know if you face a
function declaration or a template declaration. You could go for
some ambiguous parsing, but each template argument can itself be
a type, an expression or a symbol.
In such situation, it is much simpler to skip input to the
closing parentheses, check what's coming after and act
accordingly. The alternative is to go for some ambiguous
function/template parameters parsing and resolve at the end, but
as template argument are themselves ambiguous
type/expression/symbols, the amount of complexity in the parser
is doomed to explode. Note that the example is not self
contained, for instance, both template parameters and argument
can imply template instantiation, which means ambiguous argument
parsing.
SDC does a good deal of ambiguous parsing without backtracking
(more than DMD does), but you got to draw the line somewhere.
What I'd like to see is a go to the closing token feature, that
can eventually take a fast path to do so, or an efficient token
buffering system (I'm not sure which feature would be the
fastest, but I'm inclined to think this is the first one,
however, that won't be suited for a dmd style parser).
More information about the Digitalmars-d
mailing list