Degenerate Regex Case

TheFlyingFiddle via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Fri Apr 24 16:56:33 PDT 2015


On Friday, 24 April 2015 at 18:28:16 UTC, Guillaume wrote:
> Hello, I'm trying to make a regex comparison with D, based off 
> of this article: https://swtch.com/~rsc/regexp/regexp1.html
>
> I've written my code like so:
>
> import std.stdio, std.regex;
>
> void main(string argv[]) {
>
> 	string m = argv[1];
> 	auto p = 
> ctRegex!("a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa");
> 	if (match(m, p)) {
> 		writeln("match");
> 	} else {
> 		writeln("no match");
> 	}
>
> }
>
> And the compiler goes into swap. Doing it at runtime is no 
> better. I was under the impression that this particular regex 
> was used for showcasing the Thompson NFA which D claims to be 
> using.

The regex 
"a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" 
can be simplified to "a{30,60}" (if i counted correctly).

The regex "a{30,60}" works fine.

[Speculation]
I don't have a good understanding of how D's regex engine work 
but I am guessing that it does not do any simplification of the 
regex input causing it to generate larger engines for each 
additional ? symbol. Thus needing more memory. Eventually as in 
this case the compiler runs out of memory.






More information about the Digitalmars-d-learn mailing list