Use SIMD to accelerate comment lexing
deadalnix via Digitalmars-d
digitalmars-d at puremagic.com
Thu Jun 4 14:44:46 PDT 2015
On Thursday, 4 June 2015 at 18:39:02 UTC, Walter Bright wrote:
> On 6/3/2015 7:05 PM, deadalnix wrote:
>> On Wednesday, 3 June 2015 at 22:50:52 UTC, Walter Bright wrote:
>>> On 6/2/2015 5:45 PM, deadalnix wrote:
>>>> You go though character and look for a '/'. When you hit
>>>> one, you check if the
>>>> character before it is a *, and if so, you have the end of
>>>> the comment. There is
>>>> obviously various edges cases to take into account, but that
>>>> is the general
>>>> idea.
>>> Line numbers have to be kept track of as well.
>>
>> They retrieve line number lazily when needed, with various
>> mechanism to speedup
>> the lookup.
>
> Hmm. There's no way to get the line number without counting
> LFs, and that means searching for them.
Yes, the first time you query file number, clang build metadata
about new line by going through the file's content and finding
position of new lines. The process uses vector operation as well.
Apparently, they think it is better to do that way for various
reasons:
- Position tracking is more compact (and position is embedded in
all expression, declaration, and more) which reduce memory
footprint bu quite a lot.
- It makes the lexer simpler and faster.
- You don't need to track new lines if you don't use them. If
you don't emit debug infos in C++, and have no error, most line
number are not used (not sure in D, because various language
facilities like bound checking uses line number, but that is a
win in C++).
- Debug emission have some predictable access pattern, and
algorithm to find line number from an offset in the file are
special cased to handle it.
- Finding new line can be vectorized on the whole file. t cannot
be vectorized when done in // with lexing.
Once again, I'm not sure this is a win in D, because we need line
number more than in C++, but it seems to be a win in C++.
More information about the Digitalmars-d
mailing list