Use SIMD to accelerate comment lexing

Walter Bright via Digitalmars-d digitalmars-d at puremagic.com
Thu Jun 4 17:30:47 PDT 2015


On 6/4/2015 2:44 PM, deadalnix wrote:
> On Thursday, 4 June 2015 at 18:39:02 UTC, Walter Bright wrote:
>> On 6/3/2015 7:05 PM, deadalnix wrote:
>>> On Wednesday, 3 June 2015 at 22:50:52 UTC, Walter Bright wrote:
>>>> On 6/2/2015 5:45 PM, deadalnix wrote:
>>>>> You go though character and look for a '/'. When you hit one, you check if the
>>>>> character before it is a *, and if so, you have the end of the comment.
>>>>> There is
>>>>> obviously various edges cases to take into account, but that is the general
>>>>> idea.
>>>> Line numbers have to be kept track of as well.
>>>
>>> They retrieve line number lazily when needed, with various mechanism to speedup
>>> the lookup.
>>
>> Hmm. There's no way to get the line number without counting LFs, and that
>> means searching for them.
>
> Yes, the first time you query file number, clang build metadata about new line
> by going through the file's content and finding position of new lines. The
> process uses vector operation as well.
>
> Apparently, they think it is better to do that way for various reasons:
>   - Position tracking is more compact (and position is embedded in all
> expression, declaration, and more) which reduce memory footprint bu quite a lot.
>   - It makes the lexer simpler and faster.
>   - You don't need to track new lines if you don't use them. If you don't emit
> debug infos in C++, and have no error, most line number are not used (not sure
> in D, because various language facilities like bound checking uses line number,
> but that is a win in C++).
>   - Debug emission have some predictable access pattern, and algorithm to find
> line number from an offset in the file are special cased to handle it.
>   - Finding new line can be vectorized on the whole file. t cannot be vectorized
> when done in // with lexing.
>
> Once again, I'm not sure this is a win in D, because we need line number more
> than in C++, but it seems to be a win in C++.

It's an interesting approach. I generally shoot for making the debug builds the 
fastest, because that's when people are in the edit-compile-debug loop. And the 
debug output needs line numbers :-)



More information about the Digitalmars-d mailing list