faster splitter
Chris via Digitalmars-d
digitalmars-d at puremagic.com
Thu Jun 2 08:13:57 PDT 2016
On Wednesday, 1 June 2016 at 14:32:38 UTC, Chris wrote:
For fun, I've written a search function that alternates between
the beginning and the end of a string. I'm basically using
Andrei's search algorithm for this. It is, of course only useful,
if `needle` is expected to be either towards the beginning or the
end of a string. If `needle` occurs multiple times, it matches
where ever it encounters it first.
T[] findUnique(T)(T[] haystack, T[] needle)
{
static size_t computeSkip(T[] needle)
{
size_t result = 1;
while (result < needle.length && needle[$ - 1 - result] !=
needle[$ - 1])
{
++result;
}
return result;
}
if (needle.length == 0) return haystack;
immutable size_t len = haystack.length;
immutable size_t lastIndex = needle.length - 1;
auto last = needle[lastIndex];
auto first = needle[0];
size_t skip = 0;
for (size_t i = lastIndex, j = len-lastIndex-1; i < len; ++i,
--j)
{
if (last != haystack[i])
goto back;
for (size_t k = 0; ; ++k)
{
if (k == lastIndex) return haystack[i-lastIndex..$];
if (haystack[i - lastIndex + k] != needle[k]) break;
}
back:
if (i > j) break;
if (first != haystack[j])
continue;
for (size_t k = lastIndex; ; --k)
{
if (k == 0) return haystack[j..$];
if (haystack[j + k] != needle[k]) break;
}
if (skip == 0) skip = computeSkip(needle);
i += skip;
j -= skip-1;
}
return haystack[$..$];
}
unittest
{
string s1 = "Hello abc world!";
assert (findUnique(s1, "abc") == "abc world!");
assert (findUnique(s1, "def") == "");
string s2 = "Ola, mundo!";
assert (findUnique(s2, "do!") == "do!");
string s3 = "abc";
assert (findUnique(s3, "c") == "c");
assert (findUnique(s3, "z") == "");
string s4 = "aaabaaa";
assert (findUnique(s4, "b") == "baaa");
}
I added it to the benchmarking suite (I had to turn off the error
warnings, see description above). Naturally, it shines in the
first test, as the needle is at the end of the text. The whole
thing is to be taken with a grain of salt ;)
Search in Alice in Wonderland
std: 535 ±8
manual: 454 ±10
qznc: 421 ±5
Chris: 460 ±10
Andrei: 643 ±19
Andrei2: 314 ±6
Andrei3: 352 ±5
Biloop: 100 ±0
Search in random short strings
std: 219 ±33
manual: 194 ±47
qznc: 123 ±11
Chris: 219 ±69
Andrei: 113 ±10
Andrei2: 104 ±4
Andrei3: 103 ±3
Biloop: 195 ±46
Mismatch in random long strings
std: 193 ±30
manual: 249 ±103
qznc: 156 ±30
Chris: 260 ±108
Andrei: 238 ±97
Andrei2: 107 ±11
Andrei3: 115 ±12
Biloop: 141 ±44
Search random haystack with random needle
std: 189 ±25
manual: 193 ±57
qznc: 152 ±26
Chris: 203 ±59
Andrei: 176 ±35
Andrei2: 103 ±6
Andrei3: 110 ±8
Biloop: 141 ±28
(avg slowdown vs fastest; absolute deviation)
CPU ID: GenuineIntel Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
More information about the Digitalmars-d
mailing list