regex - match/matchAll and bmatch - different output

Ivan Kazmenko via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Sat Jan 2 09:33:41 PST 2016


On Friday, 1 January 2016 at 12:29:01 UTC, anonymous wrote:
> On 30.12.2015 12:06, Ivan Kazmenko wrote:
>> As you can see, bmatch (usage discouraged in the docs) gives 
>> me the
>> result I want, but match (also discouraged) and matchAll (way 
>> to go) don't.
>>
>> Am I misusing matchAll, or is this a bug?
>
> The `\1` there is a backreference. Backreferences are not part 
> of regular expressions, in the sense that they allow you to 
> describe more than regular languages. [1]
>
> As far as I know, bmatch uses a widespread matching mechanism, 
> while match/matchAll use a different, less common one. It 
> wouldn't surprise me if match/matchAll simply didn't support 
> backreferences.
>
> Backreferences are not documented, as far as I can see, but 
> they're working in other patterns. So, yeah, this is possibly a 
> bug.
>
>
> [1] 
> https://en.wikipedia.org/wiki/Regular_expression#Patterns_for_non-regular_languages

The overview by the module author 
(http://dlang.org/regular-expression.html) does mention in the 
last paragraph that backreferences are supported.  Looks like it 
is a common feature in other programming languages, too.

The "\1" part is working correctly when "abab" or "abxab" or 
"ababx" but not "abac".  This means it is probably intended to 
work, and handling "xabab" incorrectly is a bug.

Also, as I understand it from the docs, matchAll/matchFirst use 
the most appropriate of match/bmatch internally, so if match does 
not properly support the particular backreference but bmatch 
does, the bug is in using the incorrect one to handle a pattern.

At any rate, wrong result with a 8-character pattern produces a 
"regex don't work" impression, and I hope something can be done 
about it.


More information about the Digitalmars-d-learn mailing list