std.d.lexer requirements

Walter Bright newshound2 at digitalmars.com
Wed Aug 1 22:09:05 PDT 2012


On 8/1/2012 9:54 PM, Jonathan M Davis wrote:
> Then just pass the same identifier table to the function which creates the
> token range. That doesn't require another type.

You're still going to require another type, otherwise you'll have to duplicate 
the state in every token allocation, with resultant heavy memory and 
initialization costs.

Please keep in mind that a lexer is not something you just pass a few short 
strings to. It's very very very performance critical as all those extra 
instructions add up to long delays when you're shoving millions of lines of code 
into its maw.

For the same reason you're also not going to want the lexer putting pressure on 
the GC. It could bring your whole system down.

To get a high performance lexer, you're going to be counting the average number 
of instructions executed per input character. Each one counts. Shaving one off 
is a victory. You should also be thinking about memory cache access patterns.


More information about the Digitalmars-d mailing list