D grammar as LL(1) grammar ( OT)

Mon Sep 17 02:42:36 PDT 2007

BCS wrote:
> Reply to Jascha,
> 
> 
>> Furthermore, LR parsers consider multiple potential parse trees at the
>> same time, throwing away the wrong guesses, once the necessary tokens
>> get available. LL parsers have to check each potential parse tree
>> separately.
>>
> 
> In general that may be true (I'd have to check) but often a specific 
> grammar can be rearranges to (partly) avoid that, particularly the parts 
> that appear in several trees.
> 
> for instance:
> 
> A ::= B c C | B d D | B e E ;
> 
> can be rearranged as
> 
> A ::= B A_ ;
> A_ ::= c C | d D | e E ;
> 
> This at a minimum can avoid reparsing the B part of each option.
> 
> Admittedly this is actually changing the grammar that is being parsed 
> (so I can't claim to be doing a better job with the original grammar) 
> but these can be done automatically.

if you make sure, that the parser still generates the same parse tree as 
for the unchanged grammar, that'd be cool. i wonder how this approach 
will do for the (large) D grammar.

take a look at http://seatd.mainia.de/grammar.html if you like. it's my 
version of the D grammar from the docs with the errors ironed out 
(currently, i'm not aware of any more, that is). it might save you some 
time.
like the version from the docs, it uses left recursion, though. but that 
you would have to change anyway. that grammar successfully parses tango, 
phobos, derelict, blade, dfl, core32 and more.

what i'd love to do is to have a benchmark some day, to let the 
different parsing techniques compete. i'd patch the DMD source, such 
that it can be run as a standalone parser and it'll represent the 
hand-written LL team. my seatd parser will run for the automated LR team 
and it'll be fun if you would throw your parser in as the automated LL 
team :)