Request for comments: std.d.lexer

Wed Jan 30 02:20:41 PST 2013

Am 30.01.2013 10:49, schrieb Brian Schott:
> On Monday, 28 January 2013 at 21:03:21 UTC, Timon Gehr wrote:
>> Better, but still slow.
>
> I implemented the various suggestions from a past thread and made
> the lexer only work ubyte[] (to aviod phobos converting
> everything to dchar all the time) and gave the tokenizer instance
> a character buffer that it re-uses.
>
> Results:
>
> $ avgtime -q -r 200 ./dscanner --tokenCount
> ../phobos/std/datetime.d
>
> .....
>
> If my math is right, that means it's getting 4.9 million
> tokens/second now. According to Valgrind the only way to really
> improve things now is to require that the input to the lexer
> support slicing. (Remember the secret of Tango's XML parser...)
> The bottleneck is now on the calls to .idup to construct the
> token strings from slices of the buffer.

but you still need to compare that to the current dmd lexer - nothing 
else can be a benchmark reference