faster splitter

Mon May 30 02:31:48 PDT 2016

On Sunday, 29 May 2016 at 21:07:21 UTC, qznc wrote:
> When looking at the assembly I don't like the single-byte 
> loads. Since string (ubyte[] here) is of extraordinary 
> importance, it should be worthwhile to use word loads [0] 
> instead. Really fancy would be SSE.

So far, the results look disappointing. Andrei find does not get 
faster with wordwise matching:

./benchmark.ldc
     std find: 133 ±25    +38 (3384)  -19 (6486)
  manual find: 140 ±37    +64 (2953)  -25 (6962)
    qznc find: 114 ±17    +33 (2610)  -11 (7262)
   Chris find: 146 ±39    +66 (3045)  -28 (6873)
  Andrei find: 126 ±29    +54 (2720)  -19 (7189)
Wordwise find: 130 ±30    +53 (2934)  -21 (6980)

Interesting side-note: On my laptop Andrei find is faster than 
qznc find (for LDC), but on my desktop it reverses (see above). 
Both are Intel i7. Need to find a simpler processor. Maybe 
wordwise is faster there. Alternatively, find is purely memory 
bound and the L1 cache makes every difference disappear.

Also, note how std find is faster than manual find! Finding a 
reliable benchmark is hard. :/