dmd support for IDEs and the D tool chain

Fri Oct 16 22:37:10 PDT 2009

"Ellery Newcomer" <ellery-newcomer at utulsa.edu> wrote in message 
news:hbaom1$138v$1 at digitalmars.com...
> Nick Sabalausky wrote:
>> "Ellery Newcomer" <ellery-newcomer at utulsa.edu> wrote in message
>> news:hbak0n$q5b$1 at digitalmars.com...
>>> I could count the number of places that are ambiguous syntactically or
>>> semantically on one hand, and maybe the number of places that require
>>> arbitrary lookahead also. Do LALR parsers care about arbitrary
>>> lookahead? LL(k) parsers do.
>>
>> Beats me, I'm not nearly as up on parsing theory as I'd like to be. Could
>> you give me a simple example?
>>
>>
> Well, let's see here...
>
> PrimExp -> structLiteral
> PrimExp -> functionLiteral
>
> and suppose
>
> structLiteral => { /* really, really looooooooooong expression*/ }
>
> functionLiteral => { statement }
> functionLiteral => { /* really, really looooooooooong expression*/ ; }
>
> In an LL(k) parser, when you get to PrimExp, you have to know which
> choice to make based on the first k lookahead characters (probably
> translates to '{' and one token past that). Obviously, that won't work
> for this piece, so it isn't LL(k). You have to try parsing the one, and
> back up and parse the other if the first one fails (dmd should do this,
> but currently it just looks ahead for the first semicolon. come to think
> of it, I should try patching that..)
>
> ANTLR has pretty good support for backtracking, so writing a D grammar
> for it wasn't too difficult, but then the resultant performance isn't
> anything near what I'd like.
>
> But what I'm wondering about LALR is will it have to back up if it
> chooses wrong, or can it sail on through in one parse attempt. I bet it
> can.

I'll write up a test in a minute, but for the specific type of lookahead 
you're talking about, that shouldn't be a problem for LR's in general 
(including LALR). An LR doesn't start with 'PrimExp', it starts by looking 
at that first '{'. At this point it doesn't give a rat's tiny hiney what 
nonterminal the '{' is part of - it'll just consume more tokens until it 
narrows down the list of "possible matches" to just one (or none, in the 
case of a syntax error). And once that happens, it already knows all it 
needs to know, and it moves on. So there's no looking ahead or backtracking 
needed (except maybe for a one-token lookahead to see if the next token is 
allowed and what possible matches it might rule out)

>
> And how about actual ambiguity? How well does GOLD handle that?

When you use GOLD to compile a grammar into a "compiled grammar table" 
(which will be used by the parsing engine), it'll identify and report any 
ambiguities, which, in LR parsing, fall in to one of two categories:

Reduce-Reduce conflicts: GOLD flags these as errors.

Shift-Reduce conflicts: GOLD flags these as warnings and attempts to resolve 
them by always assuming "shift". Sometimes that works out fine, sometimes it 
doesn't. But it's best to minimize them in any case.

With either type of ambiguity, it'll tell you all about the offending 
rule(s), and allow you to jump to a view of the related state in the LALR 
state table. (It handles lex problems similarly.)