D parsing

Martin Nowak code at dawg.eu
Tue Nov 5 10:57:16 PST 2013


On 10/30/2013 11:39 PM, jgetner wrote:
> Is there a function in D for evaluating expressions before compile time?.

I want to throw two more ideas in the parser generator topic which
I carry around for quite some time. I think combined they'd enable AST 
generation from clean and simple grammars. Maybe someone find this 
interesting enough to give it a try, I hardly will ever get to this.

There is an interesting paper [MTP] that describes how a slight 
modification of the ENBF grammar can be used to generate a usable AST
from the grammar definition. Each AST node is named like it's 
production. This is possible because the modified grammar disallows to 
mix alternates and sequences which forces one to name all productions.

The other paper is [GLL] parsing which is a new general parser 
generation technique. The main advantage over GLR, it's a top-down 
parser. This allows to modularize grammars, e.g. you can define a 
grammar that reuses/imports an existing grammar for expressions.

The reason to choose generalized parsers instead of say LALR(1)
is maintainability of the grammars. The needed modifications to make
a context free grammar LR(1) or LALR(1)-compatible often make them 
extremely hard to read and change.
Parse forests (many parse trees) are not a problem I think because most 
CFG only have local ambiguities so they resolve to a single parse tree.

### A motivating example (desiderata)

grammar.d
```
import number;

Identifier = "[_a-zA-Z][a-zA-Z]*";

// This still needs some syntax to annotate precedence and associativity
MulExpr = Expr "*" Expr;
DivExpr = Expr "/" Expr;
AddExpr = Expr "+" Expr;
SubExpr = Expr "-" Expr;
PowExpr = Expr "^^" Expr;

Expr = MulExpr | DivExpr | AddExpr | SubExpr | PowExpr | Number | Identifer;

Arguments = Expr % ",";

ExprStmt = Expr ";";
CallStmt = Identifier "(" Arguments ")";

Stmt = ExprStmt | CallStmt;
Stmts = Stmt*;
```
```d
auto p = genParser(import("grammar.d"));
auto stmts = p.parseStmts(input);
autp expr = p.parseExpr(input);
foreach (arg; p.parseArguments(input))
{}

final switch (expr)
{
case MulExpr.tag:
case DivExpr.tag:
case AddExpr.tag:
case SubExpr.tag:
case PowExpr.tag:
}

static assert(typeof(MulExpr == Tuple!(Expr, "expr0", Expr, "expr1")));
static assert(typeof(Stmt == Variant!(ExprStmt, CallStmt)));
static assert(typeof(Identifier) == string));
```

[MTP]: http://babel.ls.fi.upm.es/research/mtp/
        http://babel.ls.fi.upm.es/~angel/papers/finalcopy-2005prole.pdf

[GLL]: http://www.rhul.ac.uk/computerscience/research/csle/gllparsers.aspx
        http://dotat.at/tmp/gll.pdf
        for implementers - Modelling GLL Parser Implementations 
(http://dx.doi.org/10.1007/978-3-642-19440-5_4)


More information about the Digitalmars-d mailing list