[Issue 13532] std.regex performance (enums; regex vs ctRegex)

via Digitalmars-d-bugs digitalmars-d-bugs at puremagic.com
Sat Sep 27 05:55:02 PDT 2014


https://issues.dlang.org/show_bug.cgi?id=13532

--- Comment #3 from Dmitry Olshansky <dmitry.olsh at gmail.com> ---
(In reply to Vladimir Panteleev from comment #0)
> The first surprise for me was that declaring a regex object (either Regex or > StaticRegex) with "enum" was so much slower. It makes sense now that I think > about it: creating a struct literal inside a loop will be more expensive than > referencing one already residing somewhere in memory. Perhaps it might be > worth mentioning in the documentation to avoid using enum with compiled regexes.

It's a common anti-pattern, it's the same issue with array literals, it's the
same issue with anything that takes some time to compute or allocates. regex
function call does both.

It's worth adding a note though, fell free t create a pull. I'm not sure I'll
get to it soon.

(In reply to Vladimir Panteleev from comment #2)
> Well, it's slower for this particular case, not necessarily in general.
> CCing Dmitry.

That's right. Problem is simple backtracking engine of CTFE version which is an
unfortunate historical point as I'd pick the other engine of the two if I could
go back in time.

(In reply to hsteoh from comment #1)
> ctRegex is slower than regular regex?! Whoa. That just sounds completely
> wrong. What's the cause of this slowdown? I thought the whole point of
> ctRegex is to outperform runtime regex by making use of compile-time
> optimization. Whatever happened to that?? If this is the case, we might as
> well throw ctRegex away.

I'm fully aware of this. Unfortunately adding yet another engine (C-T "robust"
engine) is increasingly a maintenace disaster. 

Consider also that working on compile-time generated regex is a nightmare of
~5-10 minutes to run all tests and
constant out of memory conditions. Duplicating the amount of work done at CTFE
is something DMD CAN'T handle at the moment.

Another problem is regex accumulated a lot of technical debt, and needs a
serious amount of refactoring before pling up more stuff. Then with modular
design (the one I roughly outlined in my talk) we can put more and more
components into it. Sadly all of this goes very slooowly.

--


More information about the Digitalmars-d-bugs mailing list