How do I write a "lexer or parser generators"

Boyd gaboonviper at gmx.net
Thu Jan 23 05:00:38 PST 2014


On Thursday, 23 January 2014 at 08:12:08 UTC, OP wrote:
> I'd like Walter to reply to this. In his article here 
> http://www.drdobbs.com/architecture-and-design/so-you-want-to-write-your-own-language/240165488
>
> Walter says
>
>> Somewhat more controversial, I wouldn't bother wasting time 
>> with lexer or parser generators and other so-called "compiler 
>> compilers." They're a waste of time. Writing a lexer and 
>> parser is a tiny percentage of the job of writing a compiler. 
>> Using a generator will take up about as much time as writing 
>> one by hand, and it will marry you to the generator (which 
>> matters when porting the compiler to a new platform). And 
>> generators also have the unfortunate reputation of emitting 
>> lousy error messages.
>
> Using bison I can write complex statements which are easy for 
> me to grok and change. I wouldn't know how to change complex 
> statements if I hand wrote the parser. I don't know all the 
> places something like that would affect. Taking the syntax 
> below I'm not sure how to fork a state and discard the invalid 
> one.
>
>> foo[5] = var2
>> foo[5] foo = var2
>
> Here when I see foo[5] I'm either accessing an array (first 
> statement) or declaring variables as an array of 5 elements 
> (second statement). Just like `Foo&foo` could be a reference or 
> could be an AND statement. Foo is definitely processed on its 
> own, I don't know how to process it as both (a fork) and 
> continue on the parser to find a valid path.

A simple solution to this problem would be to try and parse the 
longer statement first. If that doesn't work, then go for the 
shorter one. In your first case you have two possible 
declarations:

  - the 'variable definition': [Type] [Name] ('=' [R-Expression])?
  - the 'value assignment'. [L-Expression] '=' [R-Expression]

When you can't parse a variable definition, try the value 
definition.


I found that writing a parser in general is not that hard. The 
difficult part is figuring out what the syntax and the AST should 
look like. I recommend just trying to create something for a 
limited set of your language and work your way up from there.


More information about the Digitalmars-d mailing list