dmd support for IDEs and the D tool chain

Sat Oct 17 09:09:45 PDT 2009

Nick Sabalausky wrote:
> "Ellery Newcomer" <ellery-newcomer at utulsa.edu> wrote in message 
> news:hbaom1$138v$1 at digitalmars.com...
>> Nick Sabalausky wrote:
>>> "Ellery Newcomer" <ellery-newcomer at utulsa.edu> wrote in message
>>> news:hbak0n$q5b$1 at digitalmars.com...
>>>> I could count the number of places that are ambiguous syntactically or
>>>> semantically on one hand, and maybe the number of places that require
>>>> arbitrary lookahead also. Do LALR parsers care about arbitrary
>>>> lookahead? LL(k) parsers do.
>>> Beats me, I'm not nearly as up on parsing theory as I'd like to be. Could
>>> you give me a simple example?
>>>
>>>
>> Well, let's see here...
>>
>> PrimExp -> structLiteral
>> PrimExp -> functionLiteral
>>
>> and suppose
>>
>> structLiteral => { /* really, really looooooooooong expression*/ }
>>
>> functionLiteral => { statement }
>> functionLiteral => { /* really, really looooooooooong expression*/ ; }
>>
>> In an LL(k) parser, when you get to PrimExp, you have to know which
>> choice to make based on the first k lookahead characters (probably
>> translates to '{' and one token past that). Obviously, that won't work
>> for this piece, so it isn't LL(k). You have to try parsing the one, and
>> back up and parse the other if the first one fails (dmd should do this,
>> but currently it just looks ahead for the first semicolon. come to think
>> of it, I should try patching that..)
>>
>> ANTLR has pretty good support for backtracking, so writing a D grammar
>> for it wasn't too difficult, but then the resultant performance isn't
>> anything near what I'd like.
>>
>> But what I'm wondering about LALR is will it have to back up if it
>> chooses wrong, or can it sail on through in one parse attempt. I bet it
>> can.
> 
> I'll write up a test in a minute, but for the specific type of lookahead 
> you're talking about, that shouldn't be a problem for LR's in general 
> (including LALR). An LR doesn't start with 'PrimExp', it starts by looking 
> at that first '{'. At this point it doesn't give a rat's tiny hiney what 
> nonterminal the '{' is part of - it'll just consume more tokens until it 
> narrows down the list of "possible matches" to just one (or none, in the 
> case of a syntax error). And once that happens, it already knows all it 
> needs to know, and it moves on. So there's no looking ahead or backtracking 
> needed (except maybe for a one-token lookahead to see if the next token is 
> allowed and what possible matches it might rule out)
> 
>> And how about actual ambiguity? How well does GOLD handle that?
> 
> When you use GOLD to compile a grammar into a "compiled grammar table" 
> (which will be used by the parsing engine), it'll identify and report any 
> ambiguities, which, in LR parsing, fall in to one of two categories:
> 
> Reduce-Reduce conflicts: GOLD flags these as errors.
> 
> Shift-Reduce conflicts: GOLD flags these as warnings and attempts to resolve 
> them by always assuming "shift". Sometimes that works out fine, sometimes it 
> doesn't. But it's best to minimize them in any case.
> 
> With either type of ambiguity, it'll tell you all about the offending 
> rule(s), and allow you to jump to a view of the related state in the LALR 
> state table. (It handles lex problems similarly.)
> 
> 

Does GOLD let you manually specify how to resolve the ambiguities? Cuz
you're going to have them with D, and murphy says they're going to be
reduce-reduce :)

All in all, she sounds pretty sweet. Maybe I'll try porting my D grammar
to GOLD when I'm past exams.

By the way, do you know if there be any way to put conditional
productions in GOLD grammars? Like if you wanted to have a grammar that
recognizes D1 or D2 dependent on a settable flag?

Wait wait wait wait, just looked on wikipedia, GOLD uses DFAs for
lexing??!!! Please tell me she can do context sensitive lexers,
otherwise how are you going to get perl-style strings let alone nesting
comments?