std.regex performance
Martin Nowak
dawg at dawgfoto.de
Wed Feb 8 18:43:15 PST 2012
On Wed, 08 Feb 2012 22:44:25 +0100, Jesse Phillips
<jessekphillips+D at gmail.com> wrote:
> I've finely moved to the new regex for some real code. I'm seeing a
> major change in performance when checking if a large number of words
> contain a digit.
>
> The english.dic file contains 134,950 entries
>
> With
> 2.056: 0.22sec
> 2.058: 7.65sec
>
> I don't expect a correction for this would make it in 2.058 as it is
> likely an issue in 2.057.
>
> --------
> import std.file;
> import std.string;
> import std.datetime;
> import std.regex;
>
> private int[string] model;
>
> void main() {
> auto name = "english.dic";
> foreach(w; std.file.readText(name).toLower.splitLines)
> model[w] += 1;
>
> foreach(w; std.string.split(readText(name)))
> if(!match(w, regex(r"\d")).empty)
> {}
> }
>
There are some more performance issues.
D has a nice built-in profiler to find such issues.
----------
import std.algorithm, std.stdio, std.string, std.path, std.regex;
private int[string] model;
int main(string[] args)
{
if (args.length != 2)
{
std.stdio.stderr.writefln("usage: %s <file>",
std.path.baseName(args[0]));
return 1;
}
auto re = std.regex.regex(r"\d");
foreach(line; std.stdio.File(args[1], "r").byLine())
{
// Bug 6791: splitter is UTF-8 unsafe
foreach(w; std.algorithm.splitter(line))
{
if(!std.regex.match(w, re).empty)
{
}
}
std.string.toLowerInPlace(line);
model[line.idup] += 1;
}
return 0;
}
More information about the Digitalmars-d
mailing list