What would need to be done to get sdc.lexer to std.lexer quality?

Jakob Ovrum jakobovrum at gmail.com
Wed Aug 1 22:31:35 PDT 2012


On Thursday, 2 August 2012 at 04:38:11 UTC, Walter Bright wrote:
> That's just not going to produce a high performance lexer.
>
> The way to do it is in the Lexer instance, have a value which 
> is the current Token instance. That way, in the normal case, 
> one NEVER has to allocate a token instance.
>
> Only when lookahead is done is storage allocation required, and 
> that list should be held by Lexer and recycled as tokens get 
> consumed. This is how the dmd lexer works.
>
> Doing one allocation per token is never going to scale to 
> trying to shove millions upon millions of lines of code through 
> it.

Which is exactly why I'm pointing out the current, poor approach. 
Having a single array with contiguous Tokens for lookahead is 
completely doable even when Token is a class with some simple 
GC.malloc and emplace composition. I think SDC's Token class is 
too big to be useful as a struct, you'd pretty much never want to 
pass it anywhere by value.


More information about the Digitalmars-d mailing list