some regex vs std.ascii vs handcode times

Juan Manuel Cabo juanmanuel.cabo at gmail.com
Tue Mar 20 22:49:27 PDT 2012


On Monday, 19 March 2012 at 04:12:33 UTC, Jay Norwood wrote:

[....]

> Ok, this was the good surprise.  Reading by chunks was faster 
> than reading the whole file, by several ms.
>
> // read files by chunk ...!better than full input
> //finished! time: 23 ms
> void wcp_files_by_chunk(string fn)
> {
> 	auto f = File(fn);
> 	foreach(chunk; f.byChunk(1_000_000)){
> 	}
> }

Try copying each received chunk into an array of the size of
the file, so that the comparison is fair (allocation time
of one big array != allocation time of single chunk). Plus,
std.file.read() might reallocate).
   Plus, I think that the GC runs at the end of a program,
and it's definitely not the same to have it scan through
a bigger heap (20Mb) than just scan through 1Mb of ram,
and with 20Mb of a file, it might have gotten 'distracted'
with pointer-looking data.
   Are you benchmarking the time of the whole program,
or of just that snippet? Is the big array out of scope
after the std.file.read() ? If so, try putting the benchmark
start and end inside the function, at the same scope
maybe inside that wcp_whole_file() function.


[....]
>
> So here is some surprise ... why is regex 136ms vs 34 ms hand 
> code?

It's not surprising to me. I don't think that there
is a single regex engine in the world (I don't think even
the legendary Ken Thompson machine code engine) that can
surpass a hand coded:
      foreach(c; buffer) { lineCount += (c == '\n'); }
for line counting.

The same goes for indexOf. If you are searching for
the position of a single char (unless maybe that the
regex is optimized for that specific case, I'd like
to know of one if it exists), I think nothing beats indexOf.


Regular expressions are for what you rather never hand code,
or for convenience.
I would rather write a regex to do a complex subtitution in Vim,
than make a Vim function just for that. They have an amazing
descriptive power. You are basically describing a program
in a single regex line.


When people talk about *fast* regexen, they talk comparatively
to other engines, not as an absolute measure.
This confuses a lot of people.

--jm





More information about the Digitalmars-d mailing list