std.regex is slow [was DUB - call to arms]
Dmitry Olshansky
dmitry.olsh at gmail.com
Thu Apr 25 16:49:27 UTC 2019
On Wednesday, 17 April 2019 at 20:32:28 UTC, Julian wrote:
> On Wednesday, 17 April 2019 at 20:10:08 UTC, JN wrote:
>
> Consider std.regex and
> https://github.com/jrfondren/topsender-bench
> The fastest std.regex option is more than 10x slower than libc
> regex, which is already too slow to seriously use for anything
> but
> once-off tasks.
Author of std.regex here. It's been awhile since I monitored its
performance.
Still, let's drill down. Caveats emptor: I do not have your
datafile, but I produced one by sampling example lines from the
web.
First, your flags are a bit off for DMD, use the following:
dmd -release -inline -O
I know, not very intuitive. This more then doubles performance
with dmd, std.regex is templated library (sadly) so it heavily
depends on passing the right flags at the application level :(
For LDC I used the following (can't remember if -O implies
-release):
ldc2 -release -O
Second, if doing match per line regex it's best to use
`matchFirst` instead of `match` which caches the engine in
between calls. `match` is intended to plow through large chunks
such as iterating over matches in memmory-mapped file and
therefore creates new engine on each call to `match`.
With these two tweaks I get a respectful speed of 1.5x of
PCRE/JIT.
IIRC PCRE_JIT option doesn't work for Unicode and std.regex
supports Unicode by default.
In general I agree - std.regex needs more love, a casual look at
disasembly shows some degradation compared to a couple years
back. Truth is, code like that needs constant tweaking.
P.S. I lack time or energy to improve on regex esp. in std
proper. I hope to get back to my experiments on rewind-regex
though. JIT compilation is on the list, mostly to avoid reliance
on compiler + being more aggressive on low-level tricks.
More information about the Digitalmars-d
mailing list