Poor regex performance?
Julian
julian.fondren at gmail.com
Thu Apr 4 09:53:06 UTC 2019
The following code, that just runs a regex against a large exim
log
to report on top senders, is 140 times slower than similar C code
using
PCRE, when compiled with just -O. With a bunch of other flags I
got it
down to only 13x slower than C code that's using libc
regcomp/regexec.
import std.stdio, std.string, std.regex, std.array,
std.algorithm;
T min(T)(T a, T b) {
if (a < b) return a;
return b;
}
void main() {
ulong[string] emailcounts;
auto re = ctRegex!(r"(?:\S+ ){3,4}<= ([^@]+@(\S+))");
foreach (line; File("exim_mainlog").byLine()) {
auto m = line.match(re);
if (m) {
++emailcounts[m.front[1].idup];
}
}
string[] senders = emailcounts.keys;
sort!((a, b) { return emailcounts[a] > emailcounts[b];
})(senders);
foreach (i; 0 .. min(senders.length, 5)) {
writefln("%5s %s", emailcounts[senders[i]],
senders[i]);
}
}
Other code's available at
https://github.com/jrfondren/topsender-bench
I get D down to 1.2x slower with PCRE and getline()
I wrote this part of the way through chapter 1 of "The D
Programming Language",
so my question is mainly: is this a fair result? std.regex is
very slow and
I should reach for PCRE if regex speed matters? Or is this code
severely
flawed somehow? I'm using a random production log; not trying to
make things
difficult.
Relatedly, how can I add custom compiler flags to rdmd, in a D
script?
For example, -L-lpcre
More information about the Digitalmars-d-learn
mailing list