Degenerate Regex Case
TheFlyingFiddle via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Fri Apr 24 16:56:33 PDT 2015
On Friday, 24 April 2015 at 18:28:16 UTC, Guillaume wrote:
> Hello, I'm trying to make a regex comparison with D, based off
> of this article: https://swtch.com/~rsc/regexp/regexp1.html
>
> I've written my code like so:
>
> import std.stdio, std.regex;
>
> void main(string argv[]) {
>
> string m = argv[1];
> auto p =
> ctRegex!("a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa");
> if (match(m, p)) {
> writeln("match");
> } else {
> writeln("no match");
> }
>
> }
>
> And the compiler goes into swap. Doing it at runtime is no
> better. I was under the impression that this particular regex
> was used for showcasing the Thompson NFA which D claims to be
> using.
The regex
"a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
can be simplified to "a{30,60}" (if i counted correctly).
The regex "a{30,60}" works fine.
[Speculation]
I don't have a good understanding of how D's regex engine work
but I am guessing that it does not do any simplification of the
regex input causing it to generate larger engines for each
additional ? symbol. Thus needing more memory. Eventually as in
this case the compiler runs out of memory.
More information about the Digitalmars-d-learn
mailing list