std.d.lexer: pre-voting review / discussion

Wed Sep 11 22:00:09 PDT 2013

On Thursday, 12 September 2013 at 03:37:55 UTC, H. S. Teoh wrote:
> I still don't understand why backtracking is necessary in the 
> first
> place. I would've thought a modern parser should be well able 
> to encode
> seen tokens in its state so that backtracking is never 
> necessary. Or
> does D grammar have tricky bits that cannot be handled this 
> way, that
> I'm unaware of?
>

The problem is that it can cause a exponential (and I literally 
mean exponential here) amount of complexity.

In some cases, the complexity is manageable, but in other that 
don't make any sense (it has to be noted that even full lexing 
don't make any sens here). For instance :

int foo()() {}
        ^

When you are at the caret position, you don't know if you face a 
function declaration or a template declaration. You could go for 
some ambiguous parsing, but each template argument can itself be 
a type, an expression or a symbol.

In such situation, it is much simpler to skip input to the 
closing parentheses, check what's coming after and act 
accordingly. The alternative is to go for some ambiguous 
function/template parameters parsing and resolve at the end, but 
as template argument are themselves ambiguous 
type/expression/symbols, the amount of complexity in the parser 
is doomed to explode. Note that the example is not self 
contained, for instance, both template parameters and argument 
can imply template instantiation, which means ambiguous argument 
parsing.

SDC does a good deal of ambiguous parsing without backtracking 
(more than DMD does), but you got to draw the line somewhere.

What I'd like to see is a go to the closing token feature, that 
can eventually take a fast path to do so, or an efficient token 
buffering system (I'm not sure which feature would be the 
fastest, but I'm inclined to think this is the first one, 
however, that won't be suited for a dmd style parser).