Success! Find is slow no more

Thu May 23 13:42:45 UTC 2019

A few comments within the code:

> T[] faster_find(T)(T[] haystack, T[] needle)
> {
>      if (needle.length == 0) return haystack[$..$];

This line should return haystack[0 .. 0] (an empty string matches at the 
beginning). Distinction is mainly academic but it will reduce the size 
of the function.

>      if (needle.length > haystack.length) return haystack[$..$];

I think this falls off naturally so no need to treat it as a special case.

(A good case to treat as a special case is needle.length == 1.)

>      immutable end = needle.length - 1;
>      size_t j = end, skip = 0;
> 
>      while(++j < haystack.length)

Whenever I see ++something I think "the first increment here is 
useless". By the same token it could happen that with something++ the 
last increment is useless. The difference is that the latter has fewer 
dependencies when it's also part of a comparison. I suggest looking into 
replacing ++j with starting at needle.length and doing the loop slightly 
differently.

>      {
>          if (haystack[j] != needle[end]) continue;

Inside this loop you have already checked the bounds so if the compiler 
generates them you may want to use .ptr[index]. There's some @trusted 
gymnastics you need to add.

>          for(size_t i = 1, k = 0; i < needle.length; ++i, ++k) {

Here you have two iteration variables that always differ by 1. You may 
want to use only one iteration variable. (I'm not sure this will improve 
matters for built-in types as there's enough registers.)

>              if (haystack[j - needle.length + i] != needle[k]) {
>                  while (skip < end && needle[end - 1 - skip] != 
> needle[end]) { ++skip; }
>                  j += skip;
>                  goto main_loop2;
>              }
>          }
>          return haystack[j - end .. $];
>      }
>      return haystack[$ .. $];
> 
>      main_loop2: while(++j < haystack.length)

Looks like the two versions of the loop differ only by the computation 
of skip. The code would be smaller (and therefore potentially faster) if 
skip were initialized e.g. to size_t.max and then initialized once. The 
test skip != size_t.max will fail only once and after that it will be 
reliably speculated.