Degenerate Regex Case

Dmitry Olshansky via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Sat Apr 25 02:30:53 PDT 2015


On Friday, 24 April 2015 at 18:28:16 UTC, Guillaume wrote:
> Hello, I'm trying to make a regex comparison with D, based off 
> of this article: https://swtch.com/~rsc/regexp/regexp1.html
>
> I've written my code like so:
>
> import std.stdio, std.regex;
>
> void main(string argv[]) {
>
> 	string m = argv[1];
> 	auto p = 
> ctRegex!("a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa");
> 	if (match(m, p)) {
> 		writeln("match");
> 	} else {
> 		writeln("no match");
> 	}
>
> }

>
> And the compiler goes into swap. Doing it at runtime is no 
> better. I was under the impression that this particular regex 
> was used for showcasing the Thompson NFA which D claims to be 
> using.
>

A quick investigation shows that it gets stuck at the end of 
pattern compilation stage.

The problem is that as a last pass D's regex goes to optimize the 
pattern to construct simple bit-scanning engine as approximation 
for prefix of original pattern. And that process is a lot like 
Thompson NFA ... _BUT_ the trick of merging equivalent threads 
wasn't applied there.

So in short: file a bug, optimizer absolutely should do 
de-duplication of threads.


> The golang code version of this runs fine, which makes me think 
> that maybe D isn't using the correct regex engine for this 
> particular regex. Or perhaps I'm using this wrong?

It uses 2 kinds of engines, run-time one is Thompson NFA. 
Compile-time is (for now) still backtracking.

---
Dmitry Olshansky


More information about the Digitalmars-d-learn mailing list