faster splitter

Chris via Digitalmars-d digitalmars-d at puremagic.com
Thu Jun 2 08:13:57 PDT 2016


On Wednesday, 1 June 2016 at 14:32:38 UTC, Chris wrote:

For fun, I've written a search function that alternates between 
the beginning and the end of a string. I'm basically using 
Andrei's search algorithm for this. It is, of course only useful, 
if `needle` is expected to be either towards the beginning or the 
end of a string. If `needle` occurs multiple times, it matches 
where ever it encounters it first.

T[] findUnique(T)(T[] haystack, T[] needle)
{
	static size_t computeSkip(T[] needle)
	{
		size_t result = 1;
		while (result < needle.length && needle[$ - 1 - result] != 
needle[$ - 1])
		{
			++result;
		}
		return result;
	}
	
	if (needle.length == 0) return haystack;
	immutable size_t len = haystack.length;
	immutable size_t lastIndex = needle.length - 1;
	auto last = needle[lastIndex];
	auto first = needle[0];
	size_t skip = 0;
	for (size_t i = lastIndex, j = len-lastIndex-1; i < len; ++i, 
--j)
	{
		if (last != haystack[i])
			goto back;
		for (size_t k = 0; ; ++k)
		{
			if (k == lastIndex) return haystack[i-lastIndex..$];
			if (haystack[i - lastIndex + k] != needle[k]) break;
		}
		back:
		if (i > j) break;
		if (first != haystack[j])
			continue;
		for (size_t k = lastIndex; ; --k)
		{
			if (k == 0) return haystack[j..$];
			if (haystack[j + k] != needle[k]) break;
		}
		if (skip == 0) skip = computeSkip(needle);
		i += skip;
     j -= skip-1;
	}
	return haystack[$..$];
}

unittest
{
	string s1 = "Hello abc world!";
	assert (findUnique(s1, "abc") == "abc world!");
	assert (findUnique(s1, "def") == "");
	string s2 = "Ola, mundo!";
	assert (findUnique(s2, "do!") == "do!");
	string s3 = "abc";
	assert (findUnique(s3, "c") == "c");
	assert (findUnique(s3, "z") == "");
	string s4 = "aaabaaa";
	assert (findUnique(s4, "b") == "baaa");
}

I added it to the benchmarking suite (I had to turn off the error 
warnings, see description above). Naturally, it shines in the 
first test, as the needle is at the end of the text. The whole 
thing is to be taken with a grain of salt ;)

Search in Alice in Wonderland
        std: 535 ±8
     manual: 454 ±10
       qznc: 421 ±5
      Chris: 460 ±10
     Andrei: 643 ±19
    Andrei2: 314 ±6
    Andrei3: 352 ±5
     Biloop: 100 ±0
Search in random short strings
        std: 219 ±33
     manual: 194 ±47
       qznc: 123 ±11
      Chris: 219 ±69
     Andrei: 113 ±10
    Andrei2: 104 ±4
    Andrei3: 103 ±3
     Biloop: 195 ±46
Mismatch in random long strings
        std: 193 ±30
     manual: 249 ±103
       qznc: 156 ±30
      Chris: 260 ±108
     Andrei: 238 ±97
    Andrei2: 107 ±11
    Andrei3: 115 ±12
     Biloop: 141 ±44
Search random haystack with random needle
        std: 189 ±25
     manual: 193 ±57
       qznc: 152 ±26
      Chris: 203 ±59
     Andrei: 176 ±35
    Andrei2: 103 ±6
    Andrei3: 110 ±8
     Biloop: 141 ±28
  (avg slowdown vs fastest; absolute deviation)
CPU ID: GenuineIntel Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz



More information about the Digitalmars-d mailing list