Writing a Parser - Walnut and aPaGeD comments
Dan
murpsoft at hotmail.com
Wed Jan 9 16:02:28 PST 2008
: D
So far I've got it doing an LL(0) predictive tokenizer which generates a [still pretty buggy] AST.
I'm quite proud of it at the moment, as I'm certain now that I can accomplish LL(0) scanning for everything but binaryOperators; where it's LL(n) | n E expression.
Jascha Wetzel Wrote:
> Alan Knowles wrote:
> > - Documentation
> > While I know it's a pain to write, the things you have already tend to
> > focus on how the parser is built, and are biased to someone
> > understanding the internals and phrase-ology involved in parsers, rather
> > than an end user - who just knows if I'm looking for this.. - then put
> > this, and the result is available in these variables:
>
> Yeah, i became aware of that through the feedback. Without thinking too
> much, i assumed that using parsers would be something you'd do only if
> you dealt with parsers intimately.
I took linguistics, and I'm an interpreter writer, and I still haven't looked at BNF or YACC/BISON notation yet. To be honest, I'm not interested in it. Like MathML, it's way too far from the machine to generate an *efficient* parser. Mine might not be efficient, but that wouldn't be intrinsic to it being low-level.
> A Terminal is a symbol that is "final" wrt. expansion, while a
> Non-Terminal can be expanded by some rules. Terminals, Tokens and Lexemes are *practically* more or less the same.
Linguistics assigns them very different meanings.
> If you think of parse trees when dealing with your grammar
Are those like sentence trees, with noun phrase, verb phrase, etc?
> ignore the rest), Non-Terminals are the inner nodes, and Terminals are
> the leaves.
That makes sense.
> > - How to handle classic situations
When I first started with Walnut 0.x, I was grinding my brain trying to figure out how to match these correctly:
{ { /* } */ } , { } }
Another classical problem is JavaScript RegExp literals or divide:
/bob/i can be "divide bob divide i", or a regexp, depending on whether we expect an operator or operand.
How would you write that?
How would the machine read that?
Both are highly important, but I find most parser writers only care about the former, and that the product is a working AST.
If I only wanted that, I could write the interpreter entirely in javascript regular expressions and it'll only be 40 lines of code. In fact, I think someone did that already, but I'm sure he wasn't that terse.
Regards,
Dan
More information about the Digitalmars-d-learn
mailing list