faster splitter

Tue May 24 05:38:04 PDT 2016

On 05/24/2016 06:44 AM, qznc wrote:
> On Tuesday, 24 May 2016 at 09:34:30 UTC, Chris wrote:
>> On Tuesday, 24 May 2016 at 07:54:38 UTC, qznc wrote:
>>> Apart from advanced algorithms, find should not be slower than a
>>> naive nested-for-loop implementation.
>>>
>>> [0] https://issues.dlang.org/show_bug.cgi?id=16066
>>
>> From Phobos [1]:
>>
>>         /* Preprocess #2: init skip[] */
>>         /* Note: This step could be made a lot faster.
>>         * A simple implementation is shown here. */
>>         this.skip = new size_t[needle.length];
>>         foreach (a; 0 .. needle.length)
>>         {
>>             size_t value = 0;
>>             while (value < needle.length
>>                    && !needlematch(needle, a, value))
>>             {
>>                 ++value;
>>             }
>>             this.skip[needle.length - a - 1] = value;
>>         }
>>
>> [1]
>> https://github.com/dlang/phobos/blob/master/std/algorithm/searching.d#L335
>>
>
> Unfortunately, it is slower. My current measurements [0]:
>
> $ ./benchmark 10000000 10 # large haystack, few iterations
> std find    took    753837231
> manual find took    129561980
> $ ./benchmark 10 10000000 # small haystack, many iterations
> std find    took    721676721
> manual find took    216418870
>
> The nested-for-loop implementation is roughly 4 times faster!
>
> Caveat: I'm not completely sure my benchmark is fair yet.
>
> [0] https://github.com/qznc/find-is-too-slow

One somewhat unrelated thing that can be done with Boyer-Moore: use a 
constant-length skip array instead of dynamic allocation. Upon 
saturation, continue with brute force. -- Andrei