Replacing tango.text.Ascii.isearch

Siarhei Siamashka siarhei.siamashka at gmail.com
Fri Oct 7 22:50:04 UTC 2022


On Friday, 7 October 2022 at 12:19:59 UTC, bachmeier wrote:
> https://www.cs.utexas.edu/users/moore/best-ideas/string-searching/
>
> "the longer the pattern is, the faster the algorithm goes"

Yes, that's how substring search works in the standard libraries 
of the other programming languages. Now please take the torhu's 
test program (posted a few comments above) and generate input for 
it using

     python -c "print(('a' * 49 + 'b') * 20000)" > test.lst

Run it to search for 
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" (the 
character 'a' replicated 50 times) and then compare its 
performance against the other similar programs doing the same 
thing:

Python:

```Python
import sys
assert len(sys.argv) >= 2, "Need a needle argument."
needle = sys.argv[1].lower()
print(sum([1 if line.lower().find(needle) != -1 else 0 for line 
in open("test.lst")]))
```

Ruby/Crystal:

```Ruby
abort "Need a needle argument." unless ARGV.size >= 1
needle = ARGV[0].downcase
puts File.open("test.lst").each_line.sum {|line| 
line.downcase.index(needle) ? 1 : 0 }
```

A generic substring search is tuned to be fast on any input in 
the other programming languages. But in Phobos a simpleminded 
slow search algorithm is used by default. The Boyer-Moore 
algorithm can be used too if it's explicitly requested, but it 
has some gotchas of its own.


More information about the Digitalmars-d-learn mailing list