Pegged, a Parsing Expression Grammar (PEG) generator in D

Philippe Sigaud philippe.sigaud at gmail.com
Sat Mar 10 15:28:42 PST 2012


Hello,

I created a new Github project, Pegged, a Parsing Expression 
Grammar (PEG) generator in D.

https://github.com/PhilippeSigaud/Pegged

docs: https://github.com/PhilippeSigaud/Pegged/wiki

PEG: http://en.wikipedia.org/wiki/Parsing_expression_grammar

The idea is to give the generator a PEG with the standard syntax. 
 From this grammar definition, a set of related parsers will be 
created, to be used at runtime or compile time.

Usage
-----

To use Pegged, just call the `grammar` function with a PEG and 
mix it in. For example:


import pegged.grammar;

mixin(grammar("
     Expr     <- Factor AddExpr*
     AddExpr  <- ('+'/'-') Factor
     Factor   <- Primary MulExpr*
     MulExpr  <- ('*'/'/') Primary
     Primary  <- Parens / Number / Variable / '-' Primary

     Parens   <- '(' Expr ')'
     Number   <~ [0-9]+
     Variable <- Identifier
"));



This creates the `Expr`, `AddExpr`, `Factor` (and so on) parsers 
for basic arithmetic expressions with operator precedence ('*' 
and '/' bind stronger than '+' or '-'). `Identifier` is a 
pre-defined parser recognizing your basic C-style identifier. 
Recursive or mutually recursive rules are OK (no left recursion 
for now).

To use a parser, use the `.parse` method. It will return a parse 
tree containing the calls to the different rules:

// Parsing at compile-time:
enum parseTree1 = Expr.parse("1 + 2 - (3*x-5)*6");

pragma(msg, parseTree1.capture);
writeln(parseTree1);

// And at runtime too:
auto parseTree2 = Expr.parse(" 0 + 123 - 456 ");
assert(parseTree2.capture == ["0", "+", "123", "-", "456"]);



Features
--------

* The complete set of PEG operators are implemented
* Pegged can parse its input at compile time and generate a 
complete parse tree at compile time. In a word: compile-time 
string (read: D code) transformation and generation.
* You can parse at runtime also, you lucky you.
* Use a standard and readable PEG syntax as a DSL, not a bunch of 
templates that hide the parser in noise.
* But you can use expression templates if you want, as parsers 
are all available as such. Pegged is implemented as an expression 
template, and what's good for the library writer is sure OK for 
the user too.
* Some useful additional operators are there too: a way to 
discard matches (thus dumping them from the parse tree), to push 
captures on a stack, to accept matches that are equal to another 
match
* Adding new parsers is easy.
* Grammars are composable: you can put different 
`mixin(grammar(rules));` in a module and then grammars and rules 
can refer to one another. That way, you can have utility grammars 
providing their functionalities to other grammars.
* That's why Pegged comes with some pre-defined grammars (JSON, 
etc).
* Grammars can be dumped in a file to create a D module.

More advanced features, outside the standard PEG perimeter are 
there to bring more power in the mix:

* Parametrized rules: `List(E, Sep) <- E (Sep E)*` is possible. 
The previous rule defines a parametrized parser taking two other 
parsers (namely, `E` and `Sep`) to match a `Sep`-separated list 
of `E`'s.
* Named captures: any parser can be named with the `=` operator. 
The parse tree generated by the parser (so, also its matches) is 
delivered to the user in the output. Other parsers in the grammar 
see the named captures too.
* Semantic actions can be added to any rule in a grammar. Once a 
rule has matched, its associated action is called on the rule 
output and passed as final result to other parsers further up the 
grammar. Do what you want to the parse tree. If the passed 
actions are delegates, they can access external variables.


Philippe



More information about the Digitalmars-d-announce mailing list