How do I write a "lexer or parser generators"

Paulo Pinto pjmlp at progtools.org
Thu Jan 23 00:45:44 PST 2014


On Thursday, 23 January 2014 at 08:12:08 UTC, OP wrote:
> I'd like Walter to reply to this. In his article here 
> http://www.drdobbs.com/architecture-and-design/so-you-want-to-write-your-own-language/240165488
>
> Walter says
>
>> Somewhat more controversial, I wouldn't bother wasting time 
>> with lexer or parser generators and other so-called "compiler 
>> compilers." They're a waste of time. Writing a lexer and 
>> parser is a tiny percentage of the job of writing a compiler. 
>> Using a generator will take up about as much time as writing 
>> one by hand, and it will marry you to the generator (which 
>> matters when porting the compiler to a new platform). And 
>> generators also have the unfortunate reputation of emitting 
>> lousy error messages.
>
> I have no idea how to write one. First off when I originally 
> tried before using anything I wrote nonsense code. Second is I 
> couldn't tell if I had a conflict (either shift reduce or 
> reduce reduce) or if I wrote it wrong/badly. Third is I'm not 
> sure how a tiny one should look like and my language is fairly 
> large. I already understand flex and bison. I'm planning to 
> rewrite from scratch so I could attempt doing so on my own but 
> I don't know how to deal with the above.
>
> I just looked at parse.c and I had no idea there is a 
> precedence table. Why is there one rather than it being 
> embedded like a switch statement which tries to handle all the 
> higher precedence operations calling a function running the 
> next set of precedence? I guess that does sound like a 
> loop/table but I imagined it in switch statements.
>
> Using bison I can write complex statements which are easy for 
> me to grok and change. I wouldn't know how to change complex 
> statements if I hand wrote the parser. I don't know all the 
> places something like that would affect. Taking the syntax 
> below I'm not sure how to fork a state and discard the invalid 
> one.
>
>> foo[5] = var2
>> foo[5] foo = var2
>
> Here when I see foo[5] I'm either accessing an array (first 
> statement) or declaring variables as an array of 5 elements 
> (second statement). Just like `Foo&foo` could be a reference or 
> could be an AND statement. Foo is definitely processed on its 
> own, I don't know how to process it as both (a fork) and 
> continue on the parser to find a valid path.
>
> My bison file has >2000 states. I'm not sure how much work it 
> may mean if I handwrote my parser.


Hand written parsers are usually LL(K), you just need to write 
the grammar in a way that is LL(K) compliant.

Granted there are a few cases where only a LALR(k) parser will do 
(yacc and friends), but most context free grammars tend to be 
LL(K) as well.

There are lots of tutorials how to write such types of compilers, 
for example,

http://compilers.iecc.com/crenshaw/

http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf

http://llvm.org/docs/tutorial/

--
Paulo


More information about the Digitalmars-d mailing list