Trouble with regex backreferencing
Murp via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Mon Jun 12 07:59:12 PDT 2017
I was working around with regex trying to match certain patterns
of repeating patterns before and after a space and I came across
some unexpected behavior.
writeln("ABC ABC CBA".replaceAll(regex(r"([A-Z]) ([A-Z])"),
"D"));
//ABDBDBA
//Makes sense, replaces the 3 characters surrounding a space
with a single D
writeln("ABC ABC CBA".replaceAll(regex(r"([A-Z]) \1"), "D"));
//ABC ABDBA
//Same idea, but this time only if the 2 surrounding letters are
the same
writeln("ABC ABC CBA".replaceAll(regex(r"([A-Z]+) \1"), "D"));
//D CBA
//Same idea again, but this time match any amount of characters
as long as they are in the same order
writeln("ABCABC ABC CBA".replaceAll(regex(r"([A-Z]+) \1"), "D"));
//ABCABC ABC CBA
//Hold on, shouldn't this be "ABCD CBA"?
writeln("ABC ABCABC CBA".replaceAll(regex(r"([A-Z]+) \1"), "D"));
//DABC CBA
//Works the other way
The problem I've come across is that the regex should be matching
the largest portion of the subexpression that it can for both the
first usage, but it is matching the most it can for its first
reference without any care as to its future usage, making it only
work if the entirety of the first word is contained at the start
of the second, where it should work both ways.
Is there any gross hack I can do to get around this and if this
is for some reason intended behavior, why?
More information about the Digitalmars-d-learn
mailing list