faster splitter
Andrei Alexandrescu via Digitalmars-d
digitalmars-d at puremagic.com
Tue May 24 05:51:23 PDT 2016
On 05/24/2016 03:54 AM, qznc wrote:
> On Monday, 23 May 2016 at 22:19:18 UTC, Andrei Alexandrescu wrote:
>> On 05/23/2016 03:11 PM, qznc wrote:
>>> Actually, std find should be faster, since it could use the Boyer Moore
>>> algorithm instead of naive string matching.
>>
>> Conventional wisdom has it that find() is brute force and that's that,
>> but probably it's time to destroy. Selectively using advanced
>> searching algorithms for the appropriate inputs is very DbI-ish.
>>
>> There are a few nice precedents of blend algorithms, see e.g.
>> http://effbot.org/zone/stringlib.htm.
>>
>> Writing a generic subsequence search blend algorithm, one that chooses
>> the right algorithm based on a combination of static and dynamic
>> decisions, is quite a fun and challenging project. Who wanna?
>
> I observed that Boyer-Moore from std is still slower. My guess is due to
> the relatively short strings in my benchmark and the need to allocate
> for the tables. Knuth-Morris-Pratt might be worthwhile, because only
> requires a smaller table, which could be allocated on the stack.
There's also Rabin-Karp that can be used when the sought range is
relatively short.
> The overall sentiment seems to be that KMP vs BM depends on the input.
> This means an uninformed find function could only use heuristics.
> Anyway, Phobos could use a KnuthMorrisPrattFinder implementation next to
> the BoyerMooreFinder. I filed an issue [0].
Great. I keep on thinking how the advanced algorithms should "kick in"
only after a partial match was long enough and frequent enough. As long
as the partial match is short and infrequent, brute force will do best.
> Apart from advanced algorithms, find should not be slower than a naive
> nested-for-loop implementation.
>
> [0] https://issues.dlang.org/show_bug.cgi?id=16066
Nice, thanks.
Andrei
More information about the Digitalmars-d
mailing list