What would need to be done to get sdc.lexer to std.lexer quality?

Walter Bright newshound2 at digitalmars.com
Wed Aug 1 21:37:56 PDT 2012


On 8/1/2012 4:18 PM, Jakob Ovrum wrote:
>   * Currently files are read in their entirety first, then parsed. It is worth
> exploring the idea of reading it in chunks lazily.

Using an input range will take care of that nicely.

>   * The current result (TokenStream) is a wrapper over a GC-allocated array of
> Token class instances, each instance with its own GC allocation (new Token). It
> is worth exploring an alternative allocation strategy for the tokens.

That's just not going to produce a high performance lexer.

The way to do it is in the Lexer instance, have a value which is the current 
Token instance. That way, in the normal case, one NEVER has to allocate a token 
instance.

Only when lookahead is done is storage allocation required, and that list should 
be held by Lexer and recycled as tokens get consumed. This is how the dmd lexer 
works.

Doing one allocation per token is never going to scale to trying to shove 
millions upon millions of lines of code through it.


More information about the Digitalmars-d mailing list