std.d.lexer requirements

Timon Gehr timon.gehr at gmx.ch
Thu Aug 2 21:14:08 PDT 2012


On 08/03/2012 05:53 AM, Jonathan M Davis wrote:
> On Thursday, August 02, 2012 23:41:39 Andrei Alexandrescu wrote:
>> On 8/2/12 11:08 PM, Jonathan M Davis wrote:
>>> You're not going to get as fast a lexer if it's not written specifically
>>> for D. Writing a generic lexer is a different problem. It's also one that
>>> needs to be solved, but I think that it's a mistake to think that a
>>> generic lexer is going to be able to be as fast as one specifically
>>> optimized for D.
>>
>> Do you have any evidence to back that up? I mean you're just saying it.
>
> Because all of the rules are built directly into the code. You don't have to
> use regexes or anything like that.

The parts that can be specified with simple regexen are certainly not a
problem. A generic string mixin based lexer should be able to generate
very close to optimal code by eg. merging common token prefixes and the
like.

> Pieces of the lexer could certainly be
> generic or copied over to other lexers just fine, but when you write the lexer
> by hand specifically for D, you can guarantee that it checks exactly what it
> needs to for D without any extra cruft or lost efficiency due to decoding where
> it doesn't need to or checking an additional character at any point or
> anything along those lines.

This is achievable if it is fully generic as well. Just add generic
peephole optimizations until the generated lexer is identical to what
the hand-written one would have looked like.

> And tuning it is much easier,

Yes.

> because you have control over the whole thing.

The library writer has control over the whole thing in each case.

> Also, given features such as token strings, I
> would think that using a generic lexer on D would be rather difficult anyway.
>

It would of course need to incorporate custom parsing routine support.

> If someone wants to try and write a generic lexer for D and see if they can
> beat out any hand-written ones,

I'll possibly give it a shot if I can find the time.

> then more power to them, but I don't see how
> you could possibly expect to shave the operations down to the bare minimum
> necessary to get the job done with a generic lexer, whereas a hand-written
> parser can do that given enough effort.
>

If it is optimal for D lexing and close-optimal or optimal for other
languages then it is profoundly more useful than just a D lexer.



More information about the Digitalmars-d mailing list