[Issue 13532] New: std.regex performance (enums; regex vs ctRegex)
via Digitalmars-d-bugs
digitalmars-d-bugs at puremagic.com
Thu Sep 25 19:02:36 PDT 2014
https://issues.dlang.org/show_bug.cgi?id=13532
Issue ID: 13532
Summary: std.regex performance (enums; regex vs ctRegex)
Product: D
Version: D2
Hardware: All
OS: All
Status: NEW
Keywords: performance
Severity: enhancement
Priority: P5
Component: Phobos
Assignee: nobody at puremagic.com
Reporter: thecybershadow at gmail.com
I noticed something strange after accidentally introducing a performance
regression in a program using std.regex. Benchmark program:
///////////////////////////////////////////
import std.algorithm;
import std.array;
import std.conv;
import std.datetime;
import std.file;
import std.regex;
import std.stdio;
import std.string;
enum expr = `;.*`;
enum repl = "";
enum fn = `alice30.txt`;
enum N = 5000;
string[] lines;
void regexInline()
{
lines
.map!(line => line
.replaceAll(regex(expr), repl)
)
.array
;
}
void regexAuto()
{
auto r = regex(expr);
lines
.map!(line => line
.replaceAll(r, repl)
)
.array
;
}
void regexStatic()
{
static r = regex(expr);
lines
.map!(line => line
.replaceAll(r, repl)
)
.array
;
}
void regexEnum()
{
enum r = regex(expr);
lines
.map!(line => line
.replaceAll(r, repl)
)
.array
;
}
void ctRegexInline()
{
lines
.map!(line => line
.replaceAll(ctRegex!expr, repl)
)
.array
;
}
void ctRegexAuto()
{
auto r = ctRegex!expr;
lines
.map!(line => line
.replaceAll(r, repl)
)
.array
;
}
void ctRegexStatic()
{
static r = ctRegex!expr;
lines
.map!(line => line
.replaceAll(r, repl)
)
.array
;
}
void ctRegexEnum()
{
enum r = ctRegex!expr;
lines
.map!(line => line
.replaceAll(r, repl)
)
.array
;
}
Regex!char re(string pattern)()
{
static Regex!char r;
if (r.empty)
r = regex(pattern);
return r;
}
void reInline()
{
lines
.map!(line => line
.replaceAll(re!expr, repl)
)
.array
;
}
alias funcs = TypeTuple!(
regexInline,
regexAuto,
regexStatic,
regexEnum,
ctRegexInline,
ctRegexAuto,
ctRegexStatic,
ctRegexEnum,
reInline,
);
void main()
{
auto text = cast(string)read(fn);
lines = text.splitLines();
auto results = benchmark!funcs(N);
foreach (i, func; funcs)
writeln(
__traits(identifier, func),
"\t",
to!Duration(results[i]),
);
}
///////////////////////////////////////////
Here are my results:
regexInline 10 secs, 174 ms, 254 μs, and 2 hnsecs
regexAuto 8 secs, 249 ms, 92 μs, and 5 hnsecs
regexStatic 8 secs, 155 ms, 231 μs, and 1 hnsec
regexEnum 19 secs, 358 ms, 66 μs, and 8 hnsecs
ctRegexInline 21 secs, 399 ms, 346 μs, and 5 hnsecs
ctRegexAuto 10 secs, 57 ms, and 418 μs
ctRegexStatic 10 secs, 66 ms, 489 μs, and 9 hnsecs
ctRegexEnum 21 secs, 593 ms, 486 μs, and 9 hnsecs
reInline 8 secs, 430 ms, 852 μs, and 3 hnsecs
The first surprise for me was that declaring a regex object (either Regex or
StaticRegex) with "enum" was so much slower. It makes sense now that I think
about it: creating a struct literal inside a loop will be more expensive than
referencing one already residing somewhere in memory. Perhaps it might be worth
mentioning in the documentation to avoid using enum with compiled regexes.
The second surprise was that ctRegex was slower than regular regex, although
the difference is not significative.
I don't know whether this needs any action, feel free to WONTFIX.
--
More information about the Digitalmars-d-bugs
mailing list