Lexer and parser generators using CTFE

Martin Nowak dawg at dawgfoto.de
Wed Feb 29 15:04:39 PST 2012


On Wed, 29 Feb 2012 21:30:57 +0100, Timon Gehr <timon.gehr at gmx.ch> wrote:

> On 02/29/2012 09:03 PM, Martin Nowak wrote:
>> On Wed, 29 Feb 2012 20:30:43 +0100, Timon Gehr <timon.gehr at gmx.ch>  
>> wrote:
>>
>>> On 02/29/2012 07:28 PM, Martin Nowak wrote:
>>>> On Wed, 29 Feb 2012 17:41:19 +0100, Timon Gehr <timon.gehr at gmx.ch>
>>>> wrote:
>>>>
>>>>> On 02/28/2012 07:46 PM, Martin Nowak wrote:
>>>>>>
>>>>>> https://gist.github.com/1255439 - lexer generator
>>>>>> https://gist.github.com/1262321 - complete and fast D lexer
>>>>>>
>>>>>
>>>>> Well, it is slower at lexing than DMD at parsing. What is the
>>>>> bottleneck?
>>>>
>>>> No, it's as fast as dmd's lexer.
>>>>
>>>> Writing the tokens to stdout takes a lot of time though.
>>>> Just disable the "writeln(tok);" in the main loop.
>>>
>>> I did that.
>>
>> Interesting, I've commented it out https://gist.github.com/1262321#L1559
>> and get the following.
>>
>> <<<
>> PHOBOS=~/Code/D/DPL/phobos
>> mkdir test_lexer
>> cd test_lexer
>> curl https://raw.github.com/gist/1255439/lexer.d > lexer.d
>> curl https://raw.github.com/gist/1262321/dlexer.d > dlexer.d
>> curl https://raw.github.com/gist/1262321/entity.d > entity.d
>> dmd -O -release -inline dlexer lexer entity
>> wc -l ${PHOBOS}/std/*.d
>> time ./dlexer ${PHOBOS}/std/*.d
>>>>>
>> ./dlexer ${PHOBOS}/std/*.d 0.21s user 0.00s system 99% cpu 0.211 total
>
> I get 0.160s for lexing using your lexer.
> Parsing the same file with DMDs parser takes 0.155 seconds. The  
> difference grows with larger files.

Mmh, I've retested and you're right dmd's lexer is about 2x faster.
The main overhead stems from using ranges and enforce.

Quick profiling shows that 25% is spent in popFront and std.utf.stride.
Last time I worked on this I rewrote std.utf.decode to be much faster.
But utf characters are still "decoded" twice, once for front
and then again for popFront. Also stride uses table lookup and
can't be inlined.

If switch tables were implemented on x64 one could use them for
integral ElementType.


More information about the Digitalmars-d mailing list