std.regex is slow [was DUB - call to arms]

Dmitry Olshansky dmitry.olsh at gmail.com
Thu Apr 25 16:49:27 UTC 2019


On Wednesday, 17 April 2019 at 20:32:28 UTC, Julian wrote:
> On Wednesday, 17 April 2019 at 20:10:08 UTC, JN wrote:
>
> Consider std.regex and 
> https://github.com/jrfondren/topsender-bench
> The fastest std.regex option is more than 10x slower than libc
> regex, which is already too slow to seriously use for anything 
> but
> once-off tasks.

Author of std.regex here. It's been awhile since I monitored its 
performance.
Still, let's drill down. Caveats emptor: I do not have your 
datafile, but I produced one by sampling example lines from the 
web.

First, your flags are a bit off for DMD, use the following:

dmd -release -inline -O

I know, not very intuitive. This more then doubles performance 
with dmd, std.regex is templated library (sadly) so it heavily 
depends on passing the right flags at the application level :(

For LDC I used the following (can't remember  if -O implies 
-release):

ldc2 -release -O

Second, if doing match per line regex it's best to use 
`matchFirst` instead of `match` which caches the engine in 
between calls. `match` is intended to plow through large chunks 
such as iterating over matches in memmory-mapped file and 
therefore creates new engine on each call to `match`.

With these two tweaks I get a respectful speed of 1.5x of 
PCRE/JIT.
IIRC PCRE_JIT option doesn't work for Unicode and std.regex 
supports Unicode by default.

In general I agree - std.regex needs more love, a casual look at 
disasembly shows some degradation compared to a couple years 
back. Truth is, code like that needs constant tweaking.

P.S. I lack time or energy to improve on regex esp. in std 
proper. I hope to get back to my experiments on rewind-regex 
though. JIT compilation is on the list, mostly to avoid reliance 
on compiler + being more aggressive on low-level tricks.




More information about the Digitalmars-d mailing list