std.regex is fat
Chris Katko
ckatko at gmail.com
Sun Oct 14 10:07:15 UTC 2018
On Sunday, 14 October 2018 at 03:26:33 UTC, Adam D. Ruppe wrote:
> On Sunday, 14 October 2018 at 03:07:59 UTC, Chris Katko wrote:
>> For comparison, I just tested and grep uses about 4 MB of RAM
>> to run.
>
> Running and compiling are two entirely different things.
> Running the D regex code should be comparable, but compiling it
> is slow, in great part because of internal templates...
>
> There was an effort to speed up the template code, but it is
> still not complete.
I know that. I figured people would miss my point on it though so
I should have clarified. That's why I said it's likely the
templates/DMD that's exploding--not the actual regex action.
From a simple program, it takes ~100-150MB of RAM to compile.
Adding a single regex (not compiled regex) balloons to 550MB at 5
seconds of compile time.
-----------
Anyhow, I wrote my own simple "dgrep" and compared the results
with grep, it's very competitive: (NOT to be confused with the
above RAM stats for COMPILING)
Command being timed: "sh -c cat dgrep.d | ./dgrep 'write' "
User time (seconds): 0.00
System time (seconds): 0.00
Percent of CPU this job got: 0%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 3192
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 301
Voluntary context switches: 5
Involuntary context switches: 124
Swaps: 0
File system inputs: 8
File system outputs: 8
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Command being timed: "sh -c cat dgrep.d | grep 'write'"
User time (seconds): 0.00
System time (seconds): 0.00
Percent of CPU this job got: 0%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 2224
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 2
Minor (reclaiming a frame) page faults: 282
Voluntary context switches: 10
Involuntary context switches: 0
Swaps: 0
File system inputs: 760
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
So I have to say I'm impressed with the actual performance of the
regular expressions engine--especially considering "grep" is,
IIRC, considered a fine-tuned beast.
More information about the Digitalmars-d-learn
mailing list