Writing a Parser - Walnut and aPaGeD comments

Wed Jan 9 16:02:28 PST 2008

: D

So far I've got it doing an LL(0) predictive tokenizer which generates a [still pretty buggy] AST.

I'm quite proud of it at the moment, as I'm certain now that I can accomplish LL(0) scanning for everything but binaryOperators; where it's LL(n) | n E expression.

Jascha Wetzel Wrote:
> Alan Knowles wrote:
> > - Documentation
> > While I know it's a pain to write, the things you have already tend to 
> > focus on how the parser is built, and are biased to someone 
> > understanding the internals and phrase-ology involved in parsers, rather 
> > than an end user - who just knows if I'm looking for this.. - then put 
> > this, and the result is available in these variables:
> 
> Yeah, i became aware of that through the feedback. Without thinking too 
> much, i assumed that using parsers would be something you'd do only if 
> you dealt with parsers intimately. 

I took linguistics, and I'm an interpreter writer, and I still haven't looked at BNF or YACC/BISON notation yet.  To be honest, I'm not interested in it.  Like MathML, it's way too far from the machine to generate an *efficient* parser.  Mine might not be efficient, but that wouldn't be intrinsic to it being low-level.

> A Terminal is a symbol that is "final" wrt. expansion, while a 
> Non-Terminal can be expanded by some rules. Terminals, Tokens and Lexemes are *practically* more or less the same. 

Linguistics assigns them very different meanings.

> If you think of parse trees when dealing with your grammar

Are those like sentence trees, with noun phrase, verb phrase, etc?

> ignore the rest), Non-Terminals are the inner nodes, and Terminals are 
> the leaves.

That makes sense.

> > - How to handle classic situations

When I first started with Walnut 0.x, I was grinding my brain trying to figure out how to match these correctly:

{  { /* } */ }  , { } }

Another classical problem is JavaScript RegExp literals or divide:

/bob/i  can be "divide bob divide i", or a regexp, depending on whether we expect an operator or operand.

How would you write that?
How would the machine read that?

Both are highly important, but I find most parser writers only care about the former, and that the product is a working AST.

If I only wanted that, I could write the interpreter entirely in javascript regular expressions and it'll only be 40 lines of code.  In fact, I think someone did that already, but I'm sure he wasn't that terse.

Regards,
Dan