Regex performance
Jay Norwood
jayn at prismnet.com
Mon Mar 26 09:00:49 PDT 2012
On Sunday, 25 March 2012 at 16:31:40 UTC, James Blewitt wrote:
> I'm currently trying to figure out what I'm doing differently
> in my original program. At this point I am assuming that I
> have an error in my code which causes the D program to do much
> more work that its Ruby counterpart (although I am currently
> unable to find it).
>
> When I know more I will let you know.
>
> James Blewitt
That was the same type of thing I was seeing with very simple
regex expressions. The regex was on the order of 30 times slower
than hand code for finding words in strings. The ctRegex is on
the order of 13x slower than hand code. The times below are from
parallel processing on 100MB of text files, just finding the word
boundaries. I uploaded that tests in
https://github.com/jnorwood/wc_test
I believe in all these cases the files are being cached by the
os, since I was able to see the same measurements from a ramdisk
done with imdisk. So in these cases the file reads are about
30ms of the result. The rest is cpu time, finding the words.
This is with default 7 threads
finished wcp_wcPointer! time: 98 ms
finished wcp_wcCtRegex! time: 1300 ms
finished wcp_wcRegex! time: 2946 ms
finished wcp_wcRegex2! time: 2687 ms
finished wcp_wcSlices! time: 157 ms
finished wcp_wcStdAscii! time: 225 ms
This is processing the same data with 1 thread
finished wcp_wcPointer! time: 188 ms
finished wcp_wcCtRegex! time: 2219 ms
finished wcp_wcRegex! time: 5951 ms
finished wcp_wcRegex2! time: 5502 ms
finished wcp_wcSlices! time: 318 ms
finished wcp_wcStdAscii! time: 446 ms
And this is processing the same data with 13 threads
finished wcp_wcPointer! time: 93 ms
finished wcp_wcCtRegex! time: 1110 ms
finished wcp_wcRegex! time: 2531 ms
finished wcp_wcRegex2! time: 2321 ms
finished wcp_wcSlices! time: 136 ms
finished wcp_wcStdAscii! time: 200 ms
The only change in the program that is uploaded is to add the
suggested
defaultPoolThreads(13);
at the start of main to change the ThreadPool default thread
count.
More information about the Digitalmars-d
mailing list