faster splitter
Chris via Digitalmars-d
digitalmars-d at puremagic.com
Tue May 24 02:34:30 PDT 2016
On Tuesday, 24 May 2016 at 07:54:38 UTC, qznc wrote:
> On Monday, 23 May 2016 at 22:19:18 UTC, Andrei Alexandrescu
> wrote:
>> On 05/23/2016 03:11 PM, qznc wrote:
>>> Actually, std find should be faster, since it could use the
>>> Boyer Moore
>>> algorithm instead of naive string matching.
>>
>> Conventional wisdom has it that find() is brute force and
>> that's that, but probably it's time to destroy. Selectively
>> using advanced searching algorithms for the appropriate inputs
>> is very DbI-ish.
>>
>> There are a few nice precedents of blend algorithms, see e.g.
>> http://effbot.org/zone/stringlib.htm.
>>
>> Writing a generic subsequence search blend algorithm, one that
>> chooses the right algorithm based on a combination of static
>> and dynamic decisions, is quite a fun and challenging project.
>> Who wanna?
>
> I observed that Boyer-Moore from std is still slower. My guess
> is due to the relatively short strings in my benchmark and the
> need to allocate for the tables. Knuth-Morris-Pratt might be
> worthwhile, because only requires a smaller table, which could
> be allocated on the stack.
>
> The overall sentiment seems to be that KMP vs BM depends on the
> input. This means an uninformed find function could only use
> heuristics. Anyway, Phobos could use a KnuthMorrisPrattFinder
> implementation next to the BoyerMooreFinder. I filed an issue
> [0].
>
> Apart from advanced algorithms, find should not be slower than
> a naive nested-for-loop implementation.
>
> [0] https://issues.dlang.org/show_bug.cgi?id=16066
From Phobos [1]:
/* Preprocess #2: init skip[] */
/* Note: This step could be made a lot faster.
* A simple implementation is shown here. */
this.skip = new size_t[needle.length];
foreach (a; 0 .. needle.length)
{
size_t value = 0;
while (value < needle.length
&& !needlematch(needle, a, value))
{
++value;
}
this.skip[needle.length - a - 1] = value;
}
[1]
https://github.com/dlang/phobos/blob/master/std/algorithm/searching.d#L335
More information about the Digitalmars-d
mailing list