Searching for a string in a text buffer with a regular expression

maxpat78 maxpat78 at yahoo.it
Sun Dec 8 23:55:43 PST 2013


I mean a code fragment like this:

	foreach(i; 1..2085)
	{
		// Bugbug: when we read in the buffer, we can't know anything 
about its encoding...
		// But REGEX could fail if it contained unknown chars!
		Latin1String buf;
		string s;

		try
		{
			buf = cast(Latin1String) read(format("psi\\psi%04d.htm", i));
			transcode(buf, s);
		}
		catch (Exception e)
		{
			writeln("Last record (", i, ") reached.");
			exit(1);
		}

		// Exception "Invalid UTF-8 sequence @index 1" in file 55
		enum rx = ctRegex!(`<p class="aggiornamentoAlbo">.+?</div>`, 
"gs");
		auto m = match(s, rx);

		if (! m.empty())
		{
			if (indexOf(m.captures[0], "xxxxxxxx", 0) > -1 && 
indexOf(m.captures[0], "1983", 0) > -1)
				writeln(m.captures[0]);
		}
	}

The question is: what kind of cast should I use to safely 
(=without conversion exceptions got raised) scan all possible 
kind of textual (or binary) buffer, lile in Python 2.7.x?

Thanks!


More information about the Digitalmars-d-learn mailing list