[Issue 9979] Regex bug

d-bugmail at puremagic.com d-bugmail at puremagic.com
Mon Apr 22 20:11:54 PDT 2013


http://d.puremagic.com/issues/show_bug.cgi?id=9979



--- Comment #2 from Diggory <diggsey at googlemail.com> 2013-04-22 20:11:53 PDT ---
I've tracked it down to a flaw in the algorithm:

If the | represents the read position, of the regex:
A|B

The front character is "B" and the engine uses BackLooper to read the previous
character, and then compares the two characters to find a word boundary.

The problem is that BackLooper uses the index of the input stream as a base,
not the current read position, and the input stream is one character ahead
because "B" has already been read.

Input stream position:
AB|

This is normally not a problem because there is an off-by-one error in
BackLooper so it always reads one character further back than it should which
in most circumstances cancels out the two errors and gives the correct result
(A).

The problem occurs when at the end of the string because the input stream
position stops before going past the end.

Input stream position and read position:
AB|

In this case the two characters should be "B" and <end of string>, but because
BackLooper reads one character further back the two characters are "A" and <end
of string> missing out B entirely.

This error propagates out in the form of matching a string of zero length.

The most sensible way of fixing this would seem to be to fix the off-by-one
error in BackLooper, and make it use the read position rather than the input
stream position as a base.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------


More information about the Digitalmars-d-bugs mailing list