[Issue 3136] Incorrect and strange behavior of std.regexp.RegExp if using a pattern with optional prefix and suffix longer than 1 char

d-bugmail at puremagic.com d-bugmail at puremagic.com
Wed Jul 8 12:06:27 PDT 2009


http://d.puremagic.com/issues/show_bug.cgi?id=3136





--- Comment #1 from Marcello Gnani <marcellognani at gmail.com>  2009-07-08 12:06:26 PDT ---
I had the time to investigate further; the problem is related to an incorrect
optimization performed by Phobos on the optional prefix.
The constructor code of the RegExp object calls "public void compile(string
pattern, string attributes)", that builds a correct internal RegExp program;
then, an optimization is tried calling the "void optimize()" function. In this
function, during the optimization of the REbit opcode (the opcode that
implements the prefix match when the prefix is of more than one letter), the
optionality of the prefix is lost, leading to the incorrect behavior reported.

The simplest patch I came up is to modify slightly the "int starrchars(Range r,
const(ubyte)[] prog)" function (that is called by "optimize") as follows:
. . .
        case REnm:
        case REnmq:
        // len, n, m, ()
        len = (cast(uint *)&prog[i + 1])[0];
        n   = (cast(uint *)&prog[i + 1])[1];
        m   = (cast(uint *)&prog[i + 1])[2];
        pop = &prog[i + 1 + uint.sizeof * 3];
        if (!starrchars(r, pop[0 .. len]))
            return 0;
        if (n)
            return 1;
        i += 1 + uint.sizeof * 3 + len;
        break;
. . .
should return 0 if the n operand of the REnm opcode is 0 (this changes the line
before the break statement); this avoids the insertion of the
optionality-killing first filter:
. . .
        case REnm:
        case REnmq:
        // len, n, m, ()
        len = (cast(uint *)&prog[i + 1])[0];
        n   = (cast(uint *)&prog[i + 1])[1];
        m   = (cast(uint *)&prog[i + 1])[2];
        pop = &prog[i + 1 + uint.sizeof * 3];
        if (!starrchars(r, pop[0 .. len]))
            return 0;
        if (n)
            return 1;
        return 0;
        break;
. . .

I tried it and it works now.
Maybe this solves some other regexp bug yet open.

Best regards,
Marcello Gnani

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------


More information about the Digitalmars-d-bugs mailing list