faster splitter

Chris via Digitalmars-d digitalmars-d at puremagic.com
Tue May 24 02:34:30 PDT 2016


On Tuesday, 24 May 2016 at 07:54:38 UTC, qznc wrote:
> On Monday, 23 May 2016 at 22:19:18 UTC, Andrei Alexandrescu 
> wrote:
>> On 05/23/2016 03:11 PM, qznc wrote:
>>> Actually, std find should be faster, since it could use the 
>>> Boyer Moore
>>> algorithm instead of naive string matching.
>>
>> Conventional wisdom has it that find() is brute force and 
>> that's that, but probably it's time to destroy. Selectively 
>> using advanced searching algorithms for the appropriate inputs 
>> is very DbI-ish.
>>
>> There are a few nice precedents of blend algorithms, see e.g. 
>> http://effbot.org/zone/stringlib.htm.
>>
>> Writing a generic subsequence search blend algorithm, one that 
>> chooses the right algorithm based on a combination of static 
>> and dynamic decisions, is quite a fun and challenging project. 
>> Who wanna?
>
> I observed that Boyer-Moore from std is still slower. My guess 
> is due to the relatively short strings in my benchmark and the 
> need to allocate for the tables. Knuth-Morris-Pratt might be 
> worthwhile, because only requires a smaller table, which could 
> be allocated on the stack.
>
> The overall sentiment seems to be that KMP vs BM depends on the 
> input. This means an uninformed find function could only use 
> heuristics. Anyway, Phobos could use a KnuthMorrisPrattFinder 
> implementation next to the BoyerMooreFinder. I filed an issue 
> [0].
>
> Apart from advanced algorithms, find should not be slower than 
> a naive nested-for-loop implementation.
>
> [0] https://issues.dlang.org/show_bug.cgi?id=16066

 From Phobos [1]:

         /* Preprocess #2: init skip[] */
         /* Note: This step could be made a lot faster.
         * A simple implementation is shown here. */
         this.skip = new size_t[needle.length];
         foreach (a; 0 .. needle.length)
         {
             size_t value = 0;
             while (value < needle.length
                    && !needlematch(needle, a, value))
             {
                 ++value;
             }
             this.skip[needle.length - a - 1] = value;
         }

[1] 
https://github.com/dlang/phobos/blob/master/std/algorithm/searching.d#L335


More information about the Digitalmars-d mailing list