std.regex is fat

Chris Katko ckatko at gmail.com
Sun Oct 14 10:07:15 UTC 2018


On Sunday, 14 October 2018 at 03:26:33 UTC, Adam D. Ruppe wrote:
> On Sunday, 14 October 2018 at 03:07:59 UTC, Chris Katko wrote:
>> For comparison, I just tested and grep uses about 4 MB of RAM 
>> to run.
>
> Running and compiling are two entirely different things. 
> Running the D regex code should be comparable, but compiling it 
> is slow, in great part because of internal templates...
>
> There was an effort to speed up the template code, but it is 
> still not complete.

I know that. I figured people would miss my point on it though so 
I should have clarified. That's why I said it's likely the 
templates/DMD that's exploding--not the actual regex action.

 From a simple program, it takes ~100-150MB of RAM to compile. 
Adding a single regex (not compiled regex) balloons to 550MB at 5 
seconds of compile time.

-----------

Anyhow, I wrote my own simple "dgrep" and compared the results 
with grep, it's very competitive: (NOT to be confused with the 
above RAM stats for COMPILING)


Command being timed: "sh -c cat dgrep.d | ./dgrep 'write' "
	User time (seconds): 0.00
	System time (seconds): 0.00
	Percent of CPU this job got: 0%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 3192
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 301
	Voluntary context switches: 5
	Involuntary context switches: 124
	Swaps: 0
	File system inputs: 8
	File system outputs: 8
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
  	Command being timed: "sh -c cat dgrep.d | grep 'write'"
	User time (seconds): 0.00
	System time (seconds): 0.00
	Percent of CPU this job got: 0%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.00
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 2224
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 2
	Minor (reclaiming a frame) page faults: 282
	Voluntary context switches: 10
	Involuntary context switches: 0
	Swaps: 0
	File system inputs: 760
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

So I have to say I'm impressed with the actual performance of the 
regular expressions engine--especially considering "grep" is, 
IIRC, considered a fine-tuned beast.


More information about the Digitalmars-d-learn mailing list