Parsing with tools.rd: idc.pad
Justin Johansson
procode at adam-dott-com.au
Wed Sep 16 06:14:54 PDT 2009
Hmm, delightful.
Thanks for sharing.
There's obviously some very talented people out there :-)
Gotta put this in my input queue for later consumption.
JJ
downs Wrote:
> Justin Johansson wrote:
> >downs Wrote:
> >>
> >> Justin Johansson wrote:
> >>> Can D people please recommend suitable tools for generating a parser (in D) for an LL(1) grammar. There's bound to be much better parser generator tools available nowadays, since my last foray into this area 10+ years ago with YACC. I've heard of tools like bison, SableCC etc but apart from the names know nothing about them.
> >>>
> >>> (Note. This question is not about writing a parser for D. It is about writing a parser in D for another language which has an LL(1) grammar).
> >>>
> >>> Thanks in advance for all help.
> >>>
> >>> -- Justin Johansson
> >>>
> >> In a completely different vein, tools.rd is a simplicistic recursive descent parser framework implemented at compiletime that I've used for most/all of my toy languages. It keeps things trivial - there's no lexing stage, it parses straight from input string. It's not that well documented, but if you want, give me a simple language description and I can write you a sample parser. It's probably the easiest to use though - just mix it in from D code :)
> >
> > Hi downs,
> >
> > Thanks for the offer but since YACC is my prior background I'll probably go to the closest tool which is the modern variant for LL(1). Still if you have a small sample to share I'm sure other D people will be delighted.
> >
> > <JJ/>
> >
>
> Well for instance, take the PAD (Pastebin Adventure) component of my IRC bot, that can run simple text adventures from a variety of sources, like local Gobby sessions, Wikis and (originally) Pastebin.com:
>
> http://dsource.org/projects/scrapple/browser/trunk/idc/pad
>
> Let's look at http://dsource.org/projects/scrapple/browser/trunk/idc/pad/engine.d
>
> L175: gotToken
>
> Functions like this form the building blocks of tools.rd parsing. They always have the form "bool gotBlarghle(ref string st, out T result)" and return true if result could be parsed from st, otherwise false (in which case st is not modified).
>
> gotToken trivially removes a token from the input text.
>
> L200: bool accept(ref string st, string cmp): This function is called internally by the parser framework to decide if st starts with a comparison string, in which case it is removed and true returned. bool accept removes tokens from both strings and compares until a comparison fails (false, st not modified) or cmp is used up (true).
>
> L230: The first use of the actual Parser DSL.
>
> return mixin(gotMatchExpr("s: log"));
>
> This simply matches "log" against the input string s. Nothing fancy.
>
> L282: Not related to the parser but still, IMHO, insanely cool.
> const string Table = `
> | bool | int | string | float
> --------+---------------+-------------+----------------------+--------
> Boolean | b | b | b?q{true}p:q{false}p | ø
> Integer | i != 0 | i | Format(i) | i
> String | s == q{true}p | atoi(s) | s | atof(s)
> Float | ø | cast(int) f | Format(f) | f`;
>
> This table contains a conversion matrix for internal types to basic type. Two things are of interest:
>
> 1) q{}p is unrolled by .litstring_expand() into nested and escaped ""s. It's a backport of D2 nestable string literals to D1.
>
> 2) The table itself. tools.ctfe contains functionality to select rows, columns, and iterate the table in column-major order. This means the above table can be automatically translated into nested if/switch statements.
>
> L487: A more instructive use of the parser framework.
>
> if (mixin(gotMatchExpr("st: [==$#eq=true$|!=$#neq=true$|<=$#eq=smaller=true$|>=$#eq=greater=true$|<$#smaller=true$|>$#greater=true$] "
> "$dg2 <- genExprMath$"
> ))) { ... }
>
> Okay, first we have a conditional branch: [a|b|c|d]. This matches each of the possible branches against the input string in turn. Segments in $$ indicate variable matches and/or programmatic reactions. $#eq=smaller=true$ basically translates to "execute eq=smaller=true when this part of the parse string is successfully reached. ".
>
> "$dg2 <- genExprMath$" means "Generate dg2 using the genExprMath function" It is assumed that this function follows the convention of bool(ref string, out typeof(dg2)).
>
> It hasn't been used in that sample, but "y <- foo/x" means "pass x as an extra parameter to foo". And that's basically it. :)
>
> Oh, just for fun, here's the unrolled D syntax for the above expression:
>
> (ref string s) {
> auto scratch = s;
> return (
> true && (ref string s) {
> auto scratch = s;
> return (true && scratch.accept("==") && (((eq=true), true))) && ((s=scratch), true)
> || (((scratch=s), true) && scratch.accept("!=") && (((neq=true), true))) && ((s=scratch), true)
> || (((scratch=s), true) && scratch.accept("<=") && (((eq=smaller=true), true))) && ((s=scratch), true)
> || (((scratch=s), true) && scratch.accept(">=") && (((eq=greater=true), true))) && ((s=scratch), true)
> || (((scratch=s), true) && scratch.accept("<") && (((smaller=true), true))) && ((s=scratch), true)
> || (((scratch=s), true) && scratch.accept(">") && (((greater=true), true))) && ((s = scratch), true);
> }(scratch) && ( genExprMath(scratch, dg2 ))
> ) && ((s = scratch), true);
> }(st)
More information about the Digitalmars-d
mailing list