std.d.lexer requirements

Thu Aug 2 20:53:40 PDT 2012

On Thursday, August 02, 2012 23:41:39 Andrei Alexandrescu wrote:
> On 8/2/12 11:08 PM, Jonathan M Davis wrote:
> > You're not going to get as fast a lexer if it's not written specifically
> > for D. Writing a generic lexer is a different problem. It's also one that
> > needs to be solved, but I think that it's a mistake to think that a
> > generic lexer is going to be able to be as fast as one specifically
> > optimized for D.
> 
> Do you have any evidence to back that up? I mean you're just saying it.

Because all of the rules are built directly into the code. You don't have to 
use regexes or anything like that. Pieces of the lexer could certainly be 
generic or copied over to other lexers just fine, but when you write the lexer 
by hand specifically for D, you can guarantee that it checks exactly what it 
needs to for D without any extra cruft or lost efficiency due to decoding where 
it doesn't need to or checking an additional character at any point or 
anything along those lines. And tuning it is much easier, because you have 
control over the whole thing. Also, given features such as token strings, I 
would think that using a generic lexer on D would be rather difficult anyway.

If someone wants to try and write a generic lexer for D and see if they can 
beat out any hand-written ones, then more power to them, but I don't see how 
you could possibly expect to shave the operations down to the bare minimum 
necessary to get the job done with a generic lexer, whereas a hand-written 
parser can do that given enough effort.

- Jonathan M Davis