faster splitter

qznc via Digitalmars-d digitalmars-d at puremagic.com
Tue May 24 00:54:38 PDT 2016


On Monday, 23 May 2016 at 22:19:18 UTC, Andrei Alexandrescu wrote:
> On 05/23/2016 03:11 PM, qznc wrote:
>> Actually, std find should be faster, since it could use the 
>> Boyer Moore
>> algorithm instead of naive string matching.
>
> Conventional wisdom has it that find() is brute force and 
> that's that, but probably it's time to destroy. Selectively 
> using advanced searching algorithms for the appropriate inputs 
> is very DbI-ish.
>
> There are a few nice precedents of blend algorithms, see e.g. 
> http://effbot.org/zone/stringlib.htm.
>
> Writing a generic subsequence search blend algorithm, one that 
> chooses the right algorithm based on a combination of static 
> and dynamic decisions, is quite a fun and challenging project. 
> Who wanna?

I observed that Boyer-Moore from std is still slower. My guess is 
due to the relatively short strings in my benchmark and the need 
to allocate for the tables. Knuth-Morris-Pratt might be 
worthwhile, because only requires a smaller table, which could be 
allocated on the stack.

The overall sentiment seems to be that KMP vs BM depends on the 
input. This means an uninformed find function could only use 
heuristics. Anyway, Phobos could use a KnuthMorrisPrattFinder 
implementation next to the BoyerMooreFinder. I filed an issue [0].

Apart from advanced algorithms, find should not be slower than a 
naive nested-for-loop implementation.

[0] https://issues.dlang.org/show_bug.cgi?id=16066


More information about the Digitalmars-d mailing list