faster splitter

Mon May 30 11:20:39 PDT 2016

On 05/30/2016 05:31 AM, qznc wrote:
> On Sunday, 29 May 2016 at 21:07:21 UTC, qznc wrote:
>> When looking at the assembly I don't like the single-byte loads. Since
>> string (ubyte[] here) is of extraordinary importance, it should be
>> worthwhile to use word loads [0] instead. Really fancy would be SSE.
>
> So far, the results look disappointing. Andrei find does not get faster
> with wordwise matching:
>
> ./benchmark.ldc
>      std find: 133 ±25    +38 (3384)  -19 (6486)
>   manual find: 140 ±37    +64 (2953)  -25 (6962)
>     qznc find: 114 ±17    +33 (2610)  -11 (7262)
>    Chris find: 146 ±39    +66 (3045)  -28 (6873)
>   Andrei find: 126 ±29    +54 (2720)  -19 (7189)
> Wordwise find: 130 ±30    +53 (2934)  -21 (6980)
>
> Interesting side-note: On my laptop Andrei find is faster than qznc find
> (for LDC), but on my desktop it reverses (see above). Both are Intel i7.
> Need to find a simpler processor. Maybe wordwise is faster there.
> Alternatively, find is purely memory bound and the L1 cache makes every
> difference disappear.
>
> Also, note how std find is faster than manual find! Finding a reliable
> benchmark is hard. :/

Please throw this hat into the ring as well: it should improve average 
search on large vocabulary dramatically.

https://dpaste.dzfl.pl/dc8dc6e1eb53

It uses a BM-inspired trick - once the last characters matched, if the 
match subsequently fails it needn't start from the next character in the 
haystack. The "skip" is computed lazily and in a separate function so as 
to keep the loop tight. All in all a routine worth a look. I wanted to 
write this for a long time. -- Andrei