faster splitter

Chris via Digitalmars-d digitalmars-d at puremagic.com
Sat May 28 07:18:10 PDT 2016


On Saturday, 28 May 2016 at 12:47:59 UTC, Andrei Alexandrescu 
wrote:
> On 5/28/16 6:56 AM, qznc wrote:
>> The sentinel value is `needleBack+1`, but range elements need 
>> not
>> support addition. Finding a sentinel is hard and most 
>> certainly requires
>> more assumptions about the ranges.
>
> No need for a sentinel really so long as you first search for 
> the last element of the needle, in the haystack.
>
> Allow me to put on the table a simple brute force routine, 
> specialized for arrays with copyable elements, no custom 
> predicate etc. Far as I can tell it does no redundant work:
>
> T[] find(T)(T[] haystack, T[] needle)
> {
>     if (needle.length == 0) return haystack;
>     immutable lastIndex = needle.length - 1;
>     auto last = needle[lastIndex];
>     size_t j = lastIndex;
>     for (; j < haystack.length; ++j)
>     {
>         if (haystack[j] != last) continue;
>         immutable k = j - lastIndex;
>         // last elements match
>         for (size_t i = 0; ; ++i)
>         {
>             if (i == lastIndex) return haystack[k .. $];
>             if (needle[i] != haystack[k + i]) break;
>         }
>     }
>     return haystack[$ .. $];
> }
>
> unittest
> {
>     string s1 = "hello abc world";
>     assert(find(s1, "abc") == "abc world");
>     assert(find(s1, "def") == "");
> }
>
> void main(){}
>
>
> Andrei

I included your function (I also fixed the bug in my 
findStringS_). Here's a typical result:

std find:    208 ±61
manual find: 127 ±29
qznc find:   108 ±13
Chris find:  140 ±32
Andrei find: 126 ±27
=====
std find:    207 ±59
manual find: 128 ±30
qznc find:   108 ±13
Chris find:  141 ±32
Andrei find: 125 ±27

I used dmd, because I don't have ldc on my laptop. qznc's find is 
clearly the winner.





More information about the Digitalmars-d mailing list