faster splitter

Sun May 29 05:27:09 PDT 2016

On Sunday, 29 May 2016 at 12:22:23 UTC, qznc wrote:
> I played around with the benchmark. Some more numbers:
>
> $ make ldc
> ldmd2 -O -release -inline -noboundscheck *.d -ofbenchmark.ldc
> ./benchmark.ldc
> E: wrong result with Chris find
> E: wrong result with Chris find
> E: wrong result with Chris find
>     std find: 153 ±25    +66 (1934)  -15 (7860)
>  manual find: 122 ±28    +80 (1812)  -17 (8134)
>    qznc find: 125 ±16    +18 (4644)  -15 (5126)
>   Chris find: 148 ±29    +75 (1976)  -18 (7915)
>  Andrei find: 114 ±23   +100 (1191)  -13 (8770)
>  (avg slowdown vs fastest; absolute deviation)
> $ make dmd
> dmd -O -release -inline -noboundscheck *.d -ofbenchmark.dmd
> ./benchmark.dmd
> E: wrong result with Chris find
> E: wrong result with Chris find
> E: wrong result with Chris find
>     std find: 160 ±27    +44 (3162)  -20 (6709)
>  manual find: 148 ±28    +54 (2701)  -19 (7178)
>    qznc find: 102 ±3     +27 ( 766)   -1 (9136)
>   Chris find: 175 ±30    +55 (2796)  -21 (7106)
>  Andrei find: 122 ±22    +46 (2554)  -14 (7351)
>  (avg slowdown vs fastest; absolute deviation)
>
> The additional numbers on the right are the ±MAD separated by 
> above or below the mean. For example Andrei find with ldc:
>
>   Andrei find: 114 ±23   +100 (1191)  -13 (8770)
>
> The mean slowdown is 114, which means 14% slower than the 
> fastest one. The mean absolute deviation (MAD) is 23. More 
> precisely, the mean deviation above the mean slowdown of 103 is 
> 100 and -13 below the mean slowdown. 1191 of the 10000 runs 
> were above the mean slowdown and 8770 below. The 39 missing 
> runs are equal to the mean slowdown.
>
> What bothers me is that changing the alphabet changes the 
> numbers so much. Currently, if you restrict the alphabets for 
> haystack and needle, the numbers change. The benchmark already 
> does a random subset on each run, but there is definitely a 
> bias.

You can avoid "E: wrong result with Chris find" by using

outer:
     for (auto i = 0; i < haystack.length-needle.length; i++)
     {
         if (haystack[i] != needle[0])
             continue;
         for (size_t j = i+1, k = 1; k < needle.length; ++j, ++k)
             if (haystack[j] != needle[k])
                 continue outer;
         return haystack[i..$];
     }

It's a tad faster.

I'm planning to test on more varied data and see where a bias may 
occur.