What would need to be done to get sdc.lexer to std.lexer quality?

Jakob Ovrum jakobovrum at gmail.com
Wed Aug 1 16:18:28 PDT 2012


On Wednesday, 1 August 2012 at 23:06:19 UTC, Bernard Helyer wrote:
> Okay, so I've seen several comments from several people
> regarding the need for a D lexer in Phobos. I figure
> I should contribute something to this NG other than
> misdirected anger, so here it is.
>
> SDC has a lexer, and it's pretty much complete. It handles
> unicode and script lines, and #line and friends.
>
> It's currently MIT, but I've been meaning to re license to
> to boost, so that's not an issue. It used to have some number
> lexing code stolen from DMD, but I removed that when we moved
> to MIT.
>
> https://github.com/bhelyer/SDC/blob/master/src/sdc/lexer.d
> https://github.com/bhelyer/SDC/blob/master/src/sdc/source.d
> https://github.com/bhelyer/SDC/blob/master/src/sdc/tokenstream.d
> https://github.com/bhelyer/SDC/blob/master/src/sdc/token.d
> https://github.com/bhelyer/SDC/blob/master/src/sdc/location.d
>
> TokenStream would need to become a range, name and specific
> interface details requested from you fine people.
>
> opKirbyRape will, with great regret, have to go.
>
> Documentation will need to be buffed, and it'll need to be
> renamed into Phobos style.
>
> I'm willing to do the work if people think it's worthwhile,
> and I can get some directed suggestions.
>
> -Bernard.

Some of the other comments I brought up on IRC:

  * Currently files are read in their entirety first, then parsed. 
It is worth exploring the idea of reading it in chunks lazily.
  * The current result (TokenStream) is a wrapper over a 
GC-allocated array of Token class instances, each instance with 
its own GC allocation (new Token). It is worth exploring an 
alternative allocation strategy for the tokens.

There are a *lot* of little things that need to be done, but 
everything important is in place.




More information about the Digitalmars-d mailing list