faster splitter

Chris via Digitalmars-d digitalmars-d at puremagic.com
Fri May 27 07:06:09 PDT 2016


On Friday, 27 May 2016 at 13:29:00 UTC, Andrei Alexandrescu wrote:
> On 5/27/16 8:28 AM, Chris wrote:
>
> This is surprising. It should be easy to construct examples 
> where brute force search performs arbitrarily poor, e.g. 
> searching for "aaaaaaa...aab" in a long string with only "a"s.

I will look into this. Atm, I'm using qznc's benchmark.

>> NB: I've found that `foreach` is usually faster than a manual 
>> `for`-loop.
>
> That's surprising too. I'd be interested to look at a 
> disassembly if you have one.

I have to correct myself. A manual loop is faster, I couldn't 
believe it either, so I benchmarked the same function with 
`foreach` and `for`. Turns out that `for` is _way_ faster.

>> I used qznc's benchmarking, and it's clear that the current
>> implementation of std.algorithm.find is quite a poor 
>> performer. We do
>> have to work on that. Library functions should be fast, no 
>> matter what.
>> If someone uses `find`, they wanna be sure it's optimized for 
>> speed.
>
> That's definitely the way to go. Thanks for looking into it!
>
>
> Andrei

There is a bug, if it is one, in qznc's `manual_find`. It doesn't 
find strings at the end of a string, as "ing" in "string", and it 
does not find a string that is equal to the search string as in 
"abba" == "abba".

This outperforms both `manual_find` and the improved `std find`

string findStringS_Manual(string haystack, string needle)
{
	
	if (needle.length > haystack.length)
		return haystack[$..$];
	outer:
	for (auto i = 0; i < haystack.length; i++)
	{
		if (haystack[i] != needle[0])
			continue;
		for (size_t j = i+1, k = 1; k < needle.length; ++j, ++k)
			if (haystack[j] != needle[k])
			continue outer;
		return haystack[i..$];
	}
	return haystack[$..$];
}

A typical example would be:

std find    	took     25642297
manual find 	took      7426676  // manual_find
my std find 	took      6654496  // improved find
findStringS 	took      7159556  // foreach(i, c; haystack)
findStringS_  	took      5315498  // for (auto i = 0 ...) see 
above

I think it's clear that `std find` is veeeery slow. If anyone 
wants to test my function, please do so. I don't want to spread 
wrong data, but this is what I get at the moment (ldc latest 
version).


More information about the Digitalmars-d mailing list