Fast linear search in D array (slice)

Wed Nov 7 21:29:47 UTC 2018

On Wednesday, 7 November 2018 at 21:06:14 UTC, Per Nordlöw wrote:
> I'm currently implementing overloads of some Phobos's search 
> algorithms where the haystack is a `char[]` and the needle is 
> an ASCII-clean `char`. I plan to integrate them to Phobos later 
> down the road.
>
> One such overload is at
>
> https://github.com/nordlow/phobos-next/blob/master/src/find_split_ex.d#L10
>
> which currently implements the linear search as
>
> foreach (immutable offset; 0 .. haystack.length)
> {
>     const bool hit = haystack[offset] == needles[0];
>     if (hit)
>     {
>         return Result(haystack, offset);
>     }
> }
>
> I now have a couple questions regarding the performance of the 
> above foreach-loop:
>
> - Is this the preferred way of implementing a linear search in 
> D?
> - Andrei has mentioned that sentinel-based search can be faster 
> if the haystack is allowed to be temporarily mutated at the 
> end. But I presume such a solution is only faster above a 
> certain length for the haystack. Has anybody benchmarked and 
> found a suitable length limit for this?
> - As a special-case of the above there's also the possibility 
> of mutating the last character to be a null and then reusing 
> `strchr`.
> - When is it preferred to search in blocks of 2, 4, 8 or 16 
> bytes? I presume for a certain alignment and, again, length 
> limit.
> - Would it be relevant to make such a function a compiler 
> intrinsic that handles all these cases for the generic 
> non-decoding case where haystack is a `T[]` and needle a `T`?

Have you thought about using simd for this?