Writing a Parser - aPaGeD comments

Tue Jan 8 16:10:52 PST 2008

I did spend some time looking at aPaGeD for this yesteday - here's some 
feedback that may help (and if you can answer some of the questions that 
would help me alot as well)

- It' works (yes, but you would be amazed, for the life of me I could 
not get antlr to do anything - java write once, run nowhere...) - so 
being able to generate code and working programs from the examples was 
great, and a real +100 for the project..

- Syntax - on the face of it looks reasonable, and easy to understand. - 
similar enough to antlr..
(That's the end of the really good stuff)

- Documentation
While I know it's a pain to write, the things you have already tend to 
focus on how the parser is built, and are biased to someone 
understanding the internals and phrase-ology involved in parsers, rather 
than an end user - who just knows if I'm looking for this.. - then put 
this, and the result is available in these variables:

Specifically I've no idea what the meanings of these are, and they are 
rather critical to the docs....:
"Terminal" "Non-Terminal"

- Regex's
While I can see the benefit's I'd much rather the compiler built them 
for me.. - part of the beauty of the BNF format is that it's easy to 
read, and explains regex style situations alot better.. - Otherwise (see 
below about explaining how they can deal with classic situations...)

- How to handle classic situations
This is the key to success for the Documentation. (and what is seriously 
missing) - as most people will probably have come from a lexx/yacc 
background...

These are a few classic examples that the Documentation could do with.

* Top level parser starts.
Most grammers start with a top level statement, eg.
Program:
	Statements;

In which case the application should only start by solving Statements, - 
the biggest problem I found was that I had no idea how to stop it 
matching any of the condition rules (that were only relivant to a 
specific state - eg. see next example)

* Parse a string
This is a common pattern but it's quite difficult to see how to 
implement it. -- And as above, when I tried, the parser started matching 
DoubleQuotedStringChars at the start of the file (even though it's only 
used in  DoubleQuotedString.

DoubleQuotedString;
	QUOTE DoubleQuotedStringChars QUOTE

DoubleQuotedStringChars:
	(DoubleQuotedStringChar)*

DoubleQuotedStringChar:
	"\" ANYCHAR:
	^QUOTE;

* Classic groupings:
	(.....)*    eg. many of these matches..
	(.....)+    eg. one or more of these matches..
	(.....)?    eg. one or none of these matches..
	(.....)=> ...    if forward lookup succeeds on (...) try match next combo.

Regards
Alan

Jascha Wetzel wrote:
> Dan wrote:
>> I've been messing with how to write a parser, and so far I've played 
>> with numerous patterns before eventually wanting to cry.
>>
>> At the moment, I'm trying recursive descent parsing.
>>
>> The problem is that I've realized I'm duplicating huge volumes of code 
>> to cope with the tristate decision of { unexpected, allow, require } 
>> for any given token.
>>
>> For example, to consume a for loop, you consume something similar to
>> /for\s*\((.*?)\)\s*\{(.*?)\}/
>>
>> I have it doing that, but my soul feels heavy with the masses of 
>> looped switches it's doing.  Is there any way to ease the pain?
> 
> a parser generator :)
> writing a parser or scanner manually is a bit like writing any program 
> in assembler - tedious, error-prone and not well maintainable. there's a 
> lot of stuff in a parser that can be automatically generated.
> even if you want to write the parser all by yourself, i'd rather suggest 
> you write a simple parser generator to do that tedious part for you.