Search a file skiping whitespace

Dmitry Olshansky dmitry.olsh at gmail.com
Sat Jul 16 14:07:20 PDT 2011


On 17.07.2011 0:41, Willy Martinez wrote:
> == Quote from Dmitry Olshansky (dmitry.olsh at gmail.com)'s article
>> If you wish to avoid storing all of this in an array by using e.g.
>> filter _and_  use Boyer-Moore search on it then: No, you can't do that.
>> The reason is that filter is ForwardRange with an important consequence
>> that you can't look at arbitrary Nth element in O(1). And Boyer-Moore
>> requires such and access to be anywhere efficient.
>> Why doesn't filter not provide O(1) random access ? Because to get Nth
>> element you'd need to check at least N (and potentially unlimited)
>> number of elements before in case they get filtered out.
>>> Any help?
>> If I'd had this sort of problem I'd use something along the lines:
>> auto file = File("yourfile");
>> foreach( line; file.ByLine)
>> {
>>       auto onlyDigitis = array(filter!((x){   return !isWhite(x);
>> })(line)); // this copies all digits to a new array
>>       auto result = find(onlyDigits, ... ); //your query here
>>       ///....
>> }
>>> Thanks
> I don't mind storing it in memory. Each .txt file is around 20MB so the filtered
> string should be even smaller.
>
> Still, calling array gives this error:

Not exactly calling array but I perfectly understand why you have 
confused it.

>
> ..\..\src\phobos\std\algorithm.d(3252): Error: function
> std.algorithm.BoyerMooreFinder!(result,string).BoyerMooreFinder.beFound (string
> haystack) is not callable using argument types (dchar[])
> ..\..\src\phobos\std\algorithm.d(3252): Error: cannot implicitly convert
> expression (haystack) of type dchar[] to string
> ..\..\src\phobos\std\algorithm.d(3252): Error: cannot implicitly convert
> expression (needle.beFound((__error))) of type string to dchar[] search_seq.d(13):
> Error: template instance std.algorithm.find!(dchar[],result,string) error
> instantiating

Let's drill down to the problem through this barrage of crap:

the problem statement is

  Error: cannot implicitly convert expression (haystack) of type dchar[] to string

So (apparently) the problem is that after array(filter!(... you get array of dchars (unicode codepoints)as a result of filtering string (which is UTF-8 under the hood btw) while you are going to search an UTF-8 string.
And UTF-8 string is (once again) is not random  accessible in sense of codepoints (it's needs an UTF decode though it's clearly not needed in your case).
The simplest workaround I can think of is convert needle to dstring:
auto needle =  boyerMooreFinder(to!dstring(args[1])); //found in std.conv


>
>
>  From this code:
>
> import std.algorithm;
> import std.array;
> import std.file;
> import std.stdio;
>
> void main(string[] args) {
> 	auto needle = boyerMooreFinder(args[1]);
> 	foreach (string name; dirEntries(".", SpanMode.shallow)) {
> 		if (name[$-3 .. $] == "txt") {
> 			writeln(name);
> 			string text = readText(name);
> 			auto haystack = array(filter!("a>= '0'&&  a<= '9'")(text));
> 			auto result = find(haystack, needle);
> 			writeln(result);
> 		}
> 	}
> }
>
>
> I'm using DMD 2.054 on Windows if that helps


-- 
Dmitry Olshansky



More information about the Digitalmars-d-learn mailing list