std.d.lexer: pre-voting review / discussion

Wed Sep 11 22:10:08 PDT 2013

On Thursday, 12 September 2013 at 04:57:50 UTC, Walter Bright 
wrote:
> On 9/11/2013 6:54 PM, deadalnix wrote:
>> On Thursday, 12 September 2013 at 01:39:52 UTC, Walter Bright 
>> wrote:
>>> On 9/11/2013 6:30 PM, deadalnix wrote:
>>>> Indeed. What solution do you have in mind ?
>>>
>>> The solution dmd uses is to put in an intermediary layer that 
>>> saves the
>>> lookahead tokens in a linked list.
>>
>> But then, you have an extra step when looking up every tokens 
>> + memory
>> management overhead. How big is the performance improvement ?
>
> Not really - I use a dirty trick in that a static instance is 
> always the start of the list.
>

If I understand you correctly, that mean that lookahead of one 
token do not trigger any allocation.

> But even so, an extra indirection is better than re-lexing. 
> Lexing is a clear bottleneck in the profiling. I've even been 
> thinking of special casing the code that scans comments to use 
> SIMD instructions.

See my comment, it is possible, with increased parser complexity, 
to handle many cases where you don't know what you are parsing 
yet. Doing so, lookahead is only required to find matching 
closing token. I suspect that a fast path in the lexer for that 
precise use case may be faster than buffering tokens, as it allow 
to save one branch per token.