Regex performance
Nick Sabalausky
a at a.a
Sat Mar 24 17:25:05 PDT 2012
"Kapps" <opantm2+spam at gmail.com> wrote in message
news:yudtvjsuhhimrhqaixos at forum.dlang.org...
> On Saturday, 24 March 2012 at 23:06:54 UTC, Andrei Alexandrescu wrote:
>> This might be worth looking into. Dmitry?
>>
>> http://jblewitt.com/blog/?p=462
>>
>>
>> Andrei
>
> A difference of that amount is likely expecting something like
> regex("Blah") to not have to create a new regex struct each time,
> something which I'm guessing Ruby does (as do other standard libraries
> like .NET).
Yea, I agree that's what it sounds like. I tried to post a response, but I'm
just getting this result (and yes, this is with JS enabled):
--------------------------------------------
Asirra validation failed!
ticket = start ASIRRAVALIDATION ir=cd ir data=
start RESULT ir=1cd ir 1 data=Failend Resource id #62cd ir 0 data=
start DEBUG ir=cd ir data=exceptions.Exception: invalid ticket formatend
Resource id #62cd ir 0 data=
end Resource id #62XML:
Fail
exceptions.Exception: invalid ticket format
--------------------------------------------If it's working for anyone else,
maybe you could post it for me?:
--------------------------------------------
A few things on the D verison:
- Make sure you're using a recent version of DMD. The regex engine was
overhauled fairly recently (I forget exactly which version, but the latest,
2.058 definitely has it, along with some bugfixes.)
- Make sure you're using "std.regex", not the deprecated "std.regexp".
- It sounds like this may be your main problem: Make sure you're not
re-creating the same regex multiple times:
// Bad:
foreach(str; strings)
{
auto result = match(str, regex("abc.*def"));
}
// Good:
auto myRegex = regex("abc.*def");
foreach(str; strings)
{
auto result = match(str, myRegex);
}
Some regex engines cache the regex, but D's does't ATM. I think that'll
likely get fixed though.
- Even better yet, if your regex string is a literal (or otherwise known or
computable at compile-time) as above, use the compile-time version instead:
auto myRegex = ctRegex!"abc.*def";
// [...same 'foreach' loop as before...]
--------------------------------------------
More information about the Digitalmars-d
mailing list