Searching for a string in a text buffer with a regular expression
maxpat78
maxpat78 at yahoo.it
Fri Dec 6 00:53:04 PST 2013
While porting a simple Python script to D, I found the following
problem.
I need to read in some thousand of little text files and search
every one for a match with a given regular expression.
Obviously, the program can't (and it should not) be certain about
the encoding of each input file.
I initially used read() casting it with a cast(char[]), but, at
some point, the regex engine crashed with an exception: it
encountered an UTF-8 character it couldn't automatically decode.
This is right, since char[] is not byte[].
Now I'm casting with a Latin1String, since I know this is the
right encoding for the input buffers: and it works fine, at
last... but what about if I'd need to treat a RAW (binary?
unknown encoding?) buffer?
Is there a simple and elegant solution in D for such case?
Python didn't gave such problems!
More information about the Digitalmars-d-learn
mailing list