Search a file skiping whitespace

Dmitry Olshansky dmitry.olsh at gmail.com
Sat Jul 16 13:06:29 PDT 2011


On 16.07.2011 23:05, Willy Martinez wrote:
> Hello. I'm new to D but bear with me, please.
>
> I have several files that look like this:
>
> 71104 08924 72394 13995 49707 98696
> 48245 08311 44066 67172 56025 07952
> 00384 37808 90166 13871 94258 37216
>
> I'm trying to read those files and search for sequences of digits inside,
> hopefully with the Boyer-Moore implementation in std.algorithm.
>
> Right now I have made a small script that iterates over the .txt files in the
> current directory and reads line by line and uses find on it.
>
> But I haven't been able to write a range that removes the whitespace and can
> be used with find. It should generate one long stream of digits like:
>
> 711040892472394139954970798696482450831144066671725602507952003843780890166138719425837216

If you wish to avoid storing all of this in an array by using e.g. 
filter _and_  use Boyer-Moore search on it then: No, you can't do that. 
The reason is that filter is ForwardRange with an important consequence 
that you can't look at arbitrary Nth element in O(1). And Boyer-Moore 
requires such and access to be anywhere efficient.
Why doesn't filter not provide O(1) random access ? Because to get Nth 
element you'd need to check at least N (and potentially unlimited)  
number of elements before in case they get filtered out.


> Any help?

If I'd had this sort of problem I'd use something along the lines:

auto file = File("yourfile");
foreach( line; file.ByLine)
{
     auto onlyDigitis = array(filter!((x){   return !isWhite(x); 
})(line)); // this copies all digits to a new array
     auto result = find(onlyDigits, ... ); //your query here
     ///....
}

> Thanks


-- 
Dmitry Olshansky



More information about the Digitalmars-d-learn mailing list