faster splitter
Chris via Digitalmars-d
digitalmars-d at puremagic.com
Fri May 27 07:06:09 PDT 2016
On Friday, 27 May 2016 at 13:29:00 UTC, Andrei Alexandrescu wrote:
> On 5/27/16 8:28 AM, Chris wrote:
>
> This is surprising. It should be easy to construct examples
> where brute force search performs arbitrarily poor, e.g.
> searching for "aaaaaaa...aab" in a long string with only "a"s.
I will look into this. Atm, I'm using qznc's benchmark.
>> NB: I've found that `foreach` is usually faster than a manual
>> `for`-loop.
>
> That's surprising too. I'd be interested to look at a
> disassembly if you have one.
I have to correct myself. A manual loop is faster, I couldn't
believe it either, so I benchmarked the same function with
`foreach` and `for`. Turns out that `for` is _way_ faster.
>> I used qznc's benchmarking, and it's clear that the current
>> implementation of std.algorithm.find is quite a poor
>> performer. We do
>> have to work on that. Library functions should be fast, no
>> matter what.
>> If someone uses `find`, they wanna be sure it's optimized for
>> speed.
>
> That's definitely the way to go. Thanks for looking into it!
>
>
> Andrei
There is a bug, if it is one, in qznc's `manual_find`. It doesn't
find strings at the end of a string, as "ing" in "string", and it
does not find a string that is equal to the search string as in
"abba" == "abba".
This outperforms both `manual_find` and the improved `std find`
string findStringS_Manual(string haystack, string needle)
{
if (needle.length > haystack.length)
return haystack[$..$];
outer:
for (auto i = 0; i < haystack.length; i++)
{
if (haystack[i] != needle[0])
continue;
for (size_t j = i+1, k = 1; k < needle.length; ++j, ++k)
if (haystack[j] != needle[k])
continue outer;
return haystack[i..$];
}
return haystack[$..$];
}
A typical example would be:
std find took 25642297
manual find took 7426676 // manual_find
my std find took 6654496 // improved find
findStringS took 7159556 // foreach(i, c; haystack)
findStringS_ took 5315498 // for (auto i = 0 ...) see
above
I think it's clear that `std find` is veeeery slow. If anyone
wants to test my function, please do so. I don't want to spread
wrong data, but this is what I get at the moment (ldc latest
version).
More information about the Digitalmars-d
mailing list