Search a file skiping whitespace

Johann MacDonagh johann.macdonagh.no at spam.gmail.com
Sat Jul 16 14:28:13 PDT 2011


On 7/16/2011 5:07 PM, Dmitry Olshansky wrote:
>
> Let's drill down to the problem through this barrage of crap:
>
> the problem statement is
>
> Error: cannot implicitly convert expression (haystack) of type dchar[]
> to string
>
> So (apparently) the problem is that after array(filter!(... you get
> array of dchars (unicode codepoints)as a result of filtering string
> (which is UTF-8 under the hood btw) while you are going to search an
> UTF-8 string.
> And UTF-8 string is (once again) is not random accessible in sense of
> codepoints (it's needs an UTF decode though it's clearly not needed in
> your case).
> The simplest workaround I can think of is convert needle to dstring:
> auto needle = boyerMooreFinder(to!dstring(args[1])); //found in std.conv
>
>

Nope, that doesn't work.

Here's what he should do:

import std.algorithm;
import std.array;
import std.file;
import std.stdio;

void main(string[] args) {
	foreach (string name; dirEntries(".", SpanMode.shallow)) {
		if (name[$-3 .. $] == "txt") {
			writeln(name);
			string text = readText(name);
			auto haystack = filter!("a >= '0' && a <='9'")(text);
			auto result = find(haystack, args[1]);
			writeln(result);
		}
	}
}

If you call array() on the filter you're already doing O(n) operations, 
meaning you're not going to get any performance benefit by being able to 
do Boyer-Moore.

Correct me if I'm wrong, but I believe find() will do Boyer-Moore if it 
knows its haystack is random access. This is the power of templated 
programming. find! can generate an efficient find algorithm if it knows 
the range is random access, or default to a less efficient one if not.


More information about the Digitalmars-d-learn mailing list