How do I write a "lexer or parser generators"

OP 749691a80dee at 52222a85af35.anonbox.net
Thu Jan 23 00:12:07 PST 2014


I'd like Walter to reply to this. In his article here 
http://www.drdobbs.com/architecture-and-design/so-you-want-to-write-your-own-language/240165488

Walter says

> Somewhat more controversial, I wouldn't bother wasting time 
> with lexer or parser generators and other so-called "compiler 
> compilers." They're a waste of time. Writing a lexer and parser 
> is a tiny percentage of the job of writing a compiler. Using a 
> generator will take up about as much time as writing one by 
> hand, and it will marry you to the generator (which matters 
> when porting the compiler to a new platform). And generators 
> also have the unfortunate reputation of emitting lousy error 
> messages.

I have no idea how to write one. First off when I originally 
tried before using anything I wrote nonsense code. Second is I 
couldn't tell if I had a conflict (either shift reduce or reduce 
reduce) or if I wrote it wrong/badly. Third is I'm not sure how a 
tiny one should look like and my language is fairly large. I 
already understand flex and bison. I'm planning to rewrite from 
scratch so I could attempt doing so on my own but I don't know 
how to deal with the above.

I just looked at parse.c and I had no idea there is a precedence 
table. Why is there one rather than it being embedded like a 
switch statement which tries to handle all the higher precedence 
operations calling a function running the next set of precedence? 
I guess that does sound like a loop/table but I imagined it in 
switch statements.

Using bison I can write complex statements which are easy for me 
to grok and change. I wouldn't know how to change complex 
statements if I hand wrote the parser. I don't know all the 
places something like that would affect. Taking the syntax below 
I'm not sure how to fork a state and discard the invalid one.

> foo[5] = var2
> foo[5] foo = var2

Here when I see foo[5] I'm either accessing an array (first 
statement) or declaring variables as an array of 5 elements 
(second statement). Just like `Foo&foo` could be a reference or 
could be an AND statement. Foo is definitely processed on its 
own, I don't know how to process it as both (a fork) and continue 
on the parser to find a valid path.

My bison file has >2000 states. I'm not sure how much work it may 
mean if I handwrote my parser.


More information about the Digitalmars-d mailing list