Regex matching cause lots of _d_arrayliteralTX calls
H. S. Teoh
hsteoh at quickfur.ath.cx
Thu Sep 26 17:24:41 PDT 2013
On Fri, Sep 27, 2013 at 12:00:45AM +0200, JR wrote:
> I'm working on a toy IRC bot. Much of the logic involved is
> translating the incoming raw IRC string into something that makes
> sense (so now I have two problems, etc). But I managed to cook up a
> regex that so far seems to work well. Time for callgrind!
>
> Grouped by source file, most time is spent in regex.d (as would seem
> natural) but more time is spent in gc.d than I would have expected.
> Looking at the callgraph I see that there's a curious amount of
> calls to _d_arrayliteralTX from (around) where the regex matching is
> done. (There's some inlining going on.)
>
> Example: http://dpaste.dzfl.pl/3932a231 (needs dmd head)
>
> Callgraph: http://i.imgur.com/AZEutCE.png
Actually, nevermind what I said in the last post. Obviously you're
already using ctRegex. The problem is in this code:
scope fields = raw.match(ircRegexPattern).front;
> TL;DR: 67 regex matches are done in that example snippet, on real
> but (hopefully) anonymized raw irc strings; _d_arrayliteralTX sees
> 800+ calls.
Herein lies the hint: the exact number of calls (as seen from your
callgraph) is 804, and 804 / 67 = 12 (exactly). This means that there
are precisely 12 calls to _d_arrayliteralTX per regex match. So that
leads to the question of why this is happening.
I don't know the answer, but does it help if you don't call .front on
the match object? I.e., try this:
auto m = raw.match(ircRegexPattern);
auto c = m.captures;
// c now contains the captured fields, for example, c[1] returns
// matching text for the first pair of parentheses, c[2] returns
// the matching text for the second pair, etc.. c[0] returns the
// entire match (uninteresting in your case).
T
--
Windows: the ultimate triumph of marketing over technology. -- Adrian von Bidder
More information about the Digitalmars-d-learn
mailing list