Regex performance

Nick Sabalausky a at a.a
Sat Mar 24 17:25:05 PDT 2012


"Kapps" <opantm2+spam at gmail.com> wrote in message 
news:yudtvjsuhhimrhqaixos at forum.dlang.org...
> On Saturday, 24 March 2012 at 23:06:54 UTC, Andrei Alexandrescu wrote:
>> This might be worth looking into. Dmitry?
>>
>> http://jblewitt.com/blog/?p=462
>>
>>
>> Andrei
>
> A difference of that amount is likely expecting something like 
> regex("Blah") to not have to create a new regex struct each time, 
> something which I'm guessing Ruby does (as do other standard libraries 
> like .NET).

Yea, I agree that's what it sounds like. I tried to post a response, but I'm 
just getting this result (and yes, this is with JS enabled):

--------------------------------------------
Asirra validation failed!
ticket = start ASIRRAVALIDATION ir=cd ir  data=
  start RESULT ir=1cd ir 1 data=Failend Resource id #62cd ir 0 data=
  start DEBUG ir=cd ir  data=exceptions.Exception: invalid ticket formatend 
Resource id #62cd ir 0 data=
end Resource id #62XML:

  Fail
  exceptions.Exception: invalid ticket format
--------------------------------------------If it's working for anyone else, 
maybe you could post it for me?:

--------------------------------------------
A few things on the D verison:

- Make sure you're using a recent version of DMD. The regex engine was 
overhauled fairly recently (I forget exactly which version, but the latest, 
2.058 definitely has it, along with some bugfixes.)

- Make sure you're using "std.regex", not the deprecated "std.regexp".

- It sounds like this may be your main problem: Make sure you're not 
re-creating the same regex multiple times:

// Bad:
foreach(str; strings)
{
    auto result = match(str, regex("abc.*def"));
}

// Good:
auto myRegex = regex("abc.*def");
foreach(str; strings)
{
    auto result = match(str, myRegex);
}

Some regex engines cache the regex, but D's does't ATM. I think that'll 
likely get fixed though.

- Even better yet, if your regex string is a literal (or otherwise known or 
computable at compile-time) as above, use the compile-time version instead:

auto myRegex = ctRegex!"abc.*def";
// [...same 'foreach' loop as before...]
--------------------------------------------




More information about the Digitalmars-d mailing list