Let's stop parser Hell

David Piepgrass qwertie256 at gmail.com
Sat Jul 7 14:43:57 PDT 2012


On Saturday, 7 July 2012 at 20:39:18 UTC, Roman D. Boiko wrote:
> On Saturday, 7 July 2012 at 20:26:07 UTC, David Piepgrass wrote:
>> I'd like to add that if we give tree parsing first-class 
>> treatment, I believe the most logical approach to parsing has 
>> three or more stages instead of the traditional two 
>> (lex+parse):
>>
>> 1. Lexer
>> 2. Tree-ification
>> 3. Parsing to AST (which may itself use multiple stages, e.g. 
>> parse the declarations first, then parse function bodies later)
>>
>> The new stage two simply groups things that are in parenthesis 
>> and braces. So an input stream such as the following:
>
> I bet that after stage 2 you would have performed almost the 
> same amount of work (in other words, spent almost the same 
> time) as you would if you did full parsing. After that you 
> would need to iterate the whole tree (possibly multiple times), 
> modify (or recreate if the AST is immutable) its nodes, etc. 
> Altogether this might be a lot of overhead.
>
> My opinion is that tree manipulation is something that should 
> be available to clients of parser-as-a-library or even of 
> parser+semantic analyzer, but not necessarily advantageous for 
> parser itself.

Hmm, you've got a good point there, although simple 
tree-ification is clearly less work than standard parsing, since 
statements like "auto x = y + z;" would be quickly "blitted" into 
the same node in phase 2, but would become multiple separate 
nodes in phase 3.

What I like about it is not its performance, but how it matches 
the way we think about languages. Humans tend to see overall 
structure first, and examine the fine details later. The tree 
parsing approach is similarly nonlinear and can be modularized in 
a way that might be more intuitive than traditional EBNF.

On the other hand, one could argue it is /too/ flexible, 
admitting so many different approaches to parsing that a 
front-end based on this approach is not necessarily intuitive to 
follow; and of course, not using a standard EBNF-type grammar 
could be argued to be bad.

Still... it's a fun concept, and even if the initial parsing ends 
up using the good-old lex-parse approach, semantic analysis and 
lowering can benefit from a tree parser. Tree parsing, of course, 
is just a generalization of linear parsing and so a tree parser 
generator (TPG) could work equally well for flat input.



More information about the Digitalmars-d mailing list