Regex matching cause lots of _d_arrayliteralTX calls

H. S. Teoh hsteoh at quickfur.ath.cx
Thu Sep 26 17:24:41 PDT 2013


On Fri, Sep 27, 2013 at 12:00:45AM +0200, JR wrote:
> I'm working on a toy IRC bot. Much of the logic involved is
> translating the incoming raw IRC string into something that makes
> sense (so now I have two problems, etc). But I managed to cook up a
> regex that so far seems to work well. Time for callgrind!
> 
> Grouped by source file, most time is spent in regex.d (as would seem
> natural) but more time is spent in gc.d than I would have expected.
> Looking at the callgraph I see that there's a curious amount of
> calls to _d_arrayliteralTX from (around) where the regex matching is
> done. (There's some inlining going on.)
> 
> Example:   http://dpaste.dzfl.pl/3932a231 (needs dmd head)
> 
> Callgraph: http://i.imgur.com/AZEutCE.png

Actually, nevermind what I said in the last post. Obviously you're
already using ctRegex. The problem is in this code:

	scope fields = raw.match(ircRegexPattern).front;


> TL;DR: 67 regex matches are done in that example snippet, on real
> but (hopefully) anonymized raw irc strings; _d_arrayliteralTX sees
> 800+ calls.

Herein lies the hint: the exact number of calls (as seen from your
callgraph) is 804, and 804 / 67 = 12 (exactly). This means that there
are precisely 12 calls to _d_arrayliteralTX per regex match. So that
leads to the question of why this is happening.

I don't know the answer, but does it help if you don't call .front on
the match object? I.e., try this:

	auto m = raw.match(ircRegexPattern);
	auto c = m.captures;
	// c now contains the captured fields, for example, c[1] returns
	// matching text for the first pair of parentheses, c[2] returns
	// the matching text for the second pair, etc.. c[0] returns the
	// entire match (uninteresting in your case).


T

-- 
Windows: the ultimate triumph of marketing over technology. -- Adrian von Bidder


More information about the Digitalmars-d-learn mailing list