Regex matching cause lots of _d_arrayliteralTX calls

H. S. Teoh hsteoh at quickfur.ath.cx
Thu Sep 26 17:08:53 PDT 2013


On Fri, Sep 27, 2013 at 01:51:51AM +0200, JR wrote:
> On Thursday, 26 September 2013 at 23:04:22 UTC, bearophile wrote:
> >I am not sure how a IRC bot could consume more than a tiny
> >fraction of the CPU time of a modern multi-GHz processor.
> 
> Nor does it bite into my 8 gigabytes of ram.
> 
> Forgive me, but the main culprit in all of this is still me doing it
> wrong. Can I keep the same RegexMatcher (perhaps as a struct member)
> and reuse it between matchings?

Not sure what you mean, but are you compiling regexes every time you use
them? If so, you should be storing them instead, for example:

	// Place to store precompiled regex matchers.
	struct MyContext {
		Regex!char pattern1;
		Regex!char pattern2;
		Regex!char pattern3;
		...
	}

	// Presumably, this will run far more frequently than
	// updatePatterns. 
	auto runMatches(MyContext ctxt, string message) {
		if (message.match(ctxt.pattern1)) {
			...
		} else if (message.match(ctxt.pattern2)) {
			...
		}
		...
	}

	// Presumably, this only runs once in a while, so you save on
	// the cost of compiling/storing the regex every single time you
	// run a match.
	void updatePatterns(ref MyContext ctxt,
		string newPattern1,
		string newPattern2,
		string newPattern3, ...)
	{
		ctxt.pattern1 = regex(newPattern1);
		ctxt.pattern2 = regex(newPattern2);
		ctxt.pattern3 = regex(newPattern3);
		...
	}

So when you need to update your regexes, say based on reloading a config
file or something, you'd run updatePatterns() to compile all the
patterns, then runMatches() can be used during the normal course of your
program. This should save on a lot of overhead.

Of course, if you have regexes that are fixed at compile-time, you could
use ctRegex to *really* speed things up. Or, if that makes your
compilation too slow (it does have a tendency of doing that), initialize
your patterns in a static this() block:

	Regex!char predeterminedPattern1;
	Regex!char predeterminedPattern2;
	static this() {
		predeterminedPattern1 = regex(`...`);
		predeterminedPattern2 = regex(`...`);
	}
	...
	void matchStuff(string message) {
		if (message.match(preterminedPattern1)) {
			...
		}
		...
	}


> >And I am not sure if regular expressions are a good idea to
> >implement a IRC interface.
> 
> I dare say I disagree!

Yeah, anything involving heavy string processing is probably best done
using regexes rather than ad hoc string slicing, which is bug-prone and
hard to maintain.


T

-- 
First Rule of History: History doesn't repeat itself -- historians merely repeat each other.


More information about the Digitalmars-d-learn mailing list