Replacing tango.text.Ascii.isearch

Siarhei Siamashka siarhei.siamashka at gmail.com
Thu Oct 6 08:15:10 UTC 2022


On Wednesday, 5 October 2022 at 21:50:32 UTC, torhu wrote:
> I did some basic testing, and regex was two orders of magnitude 
> faster. So now I know, I guess.

Substring search functionality is currently in a very bad shape 
in Phobos. I discovered this myself a few weeks ago when I was 
trying to solve the 
https://www.facebook.com/codingcompetitions/hacker-cup/2022/round-1/problems/A2 puzzle using D language. A part of the solution requires a fast substring search (actually a subarray search) and Phobos doesn't offer anything with even remotely acceptable performance.

Phobos does have a Boyer-Moore implementation. This algorithm is 
historically famous in computer science as one of the first 
attempts to optimize substring search, but it also has 
pathologically bad performance on certain input data and probably 
shouldn't be recommended for any practical use nowadays. The 
users of old versions of Python discovered this too: 
https://codeforces.com/blog/entry/106849?#comment-952483

The standard 'find' function from the following simple example 
also becomes awfully slow when arrays 'a' and 'b' are large 
and/or are adversarially constructed:

```D
import std;
void main() {
   auto a = [1, 2, 3, 4];
   auto b = [2, 3];
   writefln("Is %s a subarray of %s? The answer is %s.",
            b, a, a.find(b).empty ? "no" : "yes");
}
```

I think that the best fit for your use case is the 
https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm and there's an old issue in bugzilla about this: https://issues.dlang.org/show_bug.cgi?id=16066

BTW, if anyone is curious, one of the possible solutions for the 
hacker-cup/2022/round-1/problems/A2 puzzle in D language can be 
found here: 
https://codeforces.com/blog/entry/106768?#comment-952808


More information about the Digitalmars-d-learn mailing list