[Issue 17161] [REG 2.072.2] Massive Regex Slowdown

via Digitalmars-d-bugs digitalmars-d-bugs at puremagic.com
Thu Feb 9 12:32:26 PST 2017


https://issues.dlang.org/show_bug.cgi?id=17161

--- Comment #2 from Jack Stouffer <jack at jackstouffer.com> ---
Bad news: I see a similar performance decrease for run-time regex as well.

# 2.073.0
$ dmd -O -inline -release test2.d && cat input5000000.txt | time ./test2
./test2  4.44s user 0.09s system 98% cpu 4.591 total

# 2.072.2
~/digger/result/bin/dmd -O -inline -release test2.d && cat input5000000.txt |
time ./test2
./test2  3.20s user 0.09s system 98% cpu 3.344 total

I consistently get around a second and a half longer run time with 2.073.

Code

import std.algorithm;
import std.array;
import std.range;
import std.regex;
import std.stdio;
import std.typecons;
import std.utf;

static variants = [
    "agggtaaa|tttaccct",
    "[cgt]gggtaaa|tttaccc[acg]",
    "a[act]ggtaaa|tttacc[agt]t",
    "ag[act]gtaaa|tttac[agt]ct",
    "agg[act]taaa|ttta[agt]cct",
    "aggg[acg]aaa|ttt[cgt]ccct",
    "agggt[cgt]aa|tt[acg]accct",
    "agggta[cgt]a|t[acg]taccct",
    "agggtaa[cgt]|[acg]ttaccct",
];

void main()
{
    auto app = appender!string;
    app.reserve(5_000_000);
    app.put(stdin
        .byLineCopy(KeepTerminator.yes)
        .joiner
        .byChar);

    auto seq = app.data;

    auto regexLineFeeds = regex(">.*\n|\n");
    seq = seq.replaceAll(regexLineFeeds, "");

    foreach (pattern; variants)
    {
        writeln(pattern, " ", seq.matchAll(pattern).walkLength);
    }
}

--


More information about the Digitalmars-d-bugs mailing list