(SIMD) Optimized multi-byte chunk scanning

Fri Aug 25 02:40:28 PDT 2017

On Wednesday, 23 August 2017 at 22:07:30 UTC, Nordlöw wrote:
> I recall seeing some C/C++/D code that optimizes the comment- 
> and whitespace-skipping parts (tokens) of lexers by operating 
> on 2, 4 or 8-byte chunks instead of single-byte chunks. This in 
> the case when token-terminators are expressed as sets of 
> (alternative) ASCII-characters.
>
> For instance, when searching for the end of a line comment, I 
> would like to speed up the while-loop in
>
>     size_t offset;
>     string input = "// \n"; // a line-comment string
>     import std.algorithm : among;
>     // until end-of-line or file terminator
>     while (!input[offset].among!('\0', '\n', '\r')
>     {
>         ++offset;
>     }
>
> by taking `offset`-steps larger than one.
>
> Note that my file reading function that creates the real 
> `input`, appends a '\0' at the end to enable sentinel-based 
> search as shown in the call to `among` above.
>
> I further recall that there are x86_64 intrinsics that can be 
> used here for further speedups.
>
> Refs, anyone?

On line comments it doesn't sound like it will pay off since you 
would have to do extra work to make sure you work on 16 byte 
aligned memory. For multi-line comments maybe.

As for a nice reference of intel intrinsics: 
https://software.intel.com/sites/landingpage/IntrinsicsGuide/