Searching for a string in a text buffer with a regular expression
maxpat78
maxpat78 at yahoo.it
Sun Dec 8 23:55:43 PST 2013
I mean a code fragment like this:
foreach(i; 1..2085)
{
// Bugbug: when we read in the buffer, we can't know anything
about its encoding...
// But REGEX could fail if it contained unknown chars!
Latin1String buf;
string s;
try
{
buf = cast(Latin1String) read(format("psi\\psi%04d.htm", i));
transcode(buf, s);
}
catch (Exception e)
{
writeln("Last record (", i, ") reached.");
exit(1);
}
// Exception "Invalid UTF-8 sequence @index 1" in file 55
enum rx = ctRegex!(`<p class="aggiornamentoAlbo">.+?</div>`,
"gs");
auto m = match(s, rx);
if (! m.empty())
{
if (indexOf(m.captures[0], "xxxxxxxx", 0) > -1 &&
indexOf(m.captures[0], "1983", 0) > -1)
writeln(m.captures[0]);
}
}
The question is: what kind of cast should I use to safely
(=without conversion exceptions got raised) scan all possible
kind of textual (or binary) buffer, lile in Python 2.7.x?
Thanks!
More information about the Digitalmars-d-learn
mailing list