Regex performance

Jay Norwood jayn at prismnet.com
Mon Mar 26 09:00:49 PDT 2012


On Sunday, 25 March 2012 at 16:31:40 UTC, James Blewitt wrote:
> I'm currently trying to figure out what I'm doing differently 
> in my original program.  At this point I am assuming that I 
> have an error in my code which causes the D program to do much 
> more work that its Ruby counterpart (although I am currently 
> unable to find it).
>
> When I know more I will let you know.
>
> James Blewitt

That was the same type of thing I was seeing with very simple 
regex expressions. The regex was on the order of 30 times slower 
than hand code for finding words in strings.  The ctRegex is on 
the order of 13x slower than hand code.  The times below are from 
parallel processing on 100MB of text files, just finding the word 
boundaries.  I uploaded that tests in 
https://github.com/jnorwood/wc_test
I believe in all these cases the files are being cached by the 
os, since I was able to see the same measurements from a ramdisk 
done with imdisk.  So in these cases the file reads are about 
30ms of the result. The rest is cpu time, finding the words.

  This is with default 7 threads

finished wcp_wcPointer! time: 98 ms
finished wcp_wcCtRegex! time: 1300 ms
finished wcp_wcRegex! time: 2946 ms
finished wcp_wcRegex2! time: 2687 ms
finished wcp_wcSlices! time: 157 ms
finished wcp_wcStdAscii! time: 225 ms


This is processing the same data with 1 thread

finished wcp_wcPointer! time: 188 ms
finished wcp_wcCtRegex! time: 2219 ms
finished wcp_wcRegex! time: 5951 ms
finished wcp_wcRegex2! time: 5502 ms
finished wcp_wcSlices! time: 318 ms
finished wcp_wcStdAscii! time: 446 ms

And this is processing the same data with 13 threads

finished wcp_wcPointer! time: 93 ms
finished wcp_wcCtRegex! time: 1110 ms
finished wcp_wcRegex! time: 2531 ms
finished wcp_wcRegex2! time: 2321 ms
finished wcp_wcSlices! time: 136 ms
finished wcp_wcStdAscii! time: 200 ms

The only change in the program that is uploaded is to add the 
suggested
defaultPoolThreads(13);
at the start of main to change the ThreadPool default thread 
count.



More information about the Digitalmars-d mailing list