Writing a really fast lexer

Sat Dec 12 18:15:11 UTC 2020

On Saturday, 12 December 2020 at 16:43:43 UTC, Bastiaan Veelo 
wrote:
> On Friday, 11 December 2020 at 19:49:12 UTC, vnr wrote:
>> For a project with good performance, I would need to be able 
>> to analyse text. To do so, I would write a parser by hand 
>> using the recursive descent algorithm, based on a stream of 
>> tokens. I started writing a lexer with the d-lex package 
>> (https://code.dlang.org/packages/d-lex), it works really well, 
>> unfortunately, it's quite slow for the number of lines I'm 
>> aiming to analyse (I did a test, for a million lines, it 
>> lasted about 3 minutes). As the parser will only have to 
>> manipulate tokens, I think that the performance of the lexer 
>> will be more important to consider. Therefore, I wonder what 
>> resources there are, in D, for writing an efficient lexer.
>
> Have you looked at Pegged [1]? It will give you the lexer and 
> parser in one go. I'd be very interested to see how it performs 
> on that kind of input.
>
> -- Bastiaan.
>
> [1] https://code.dlang.org/packages/pegged

Yes, I know Pegged, it's a really interesting parser generator 
engine, nevertheless, the grammar of what I would like to analyse 
is not a PEG. But I am also curious to know the performances of 
this tool for very large inputs.