some regex vs std.ascii vs handcode times
Jay Norwood
jayn at prismnet.com
Wed Mar 21 21:29:40 PDT 2012
On Wednesday, 21 March 2012 at 05:49:29 UTC, Juan Manuel Cabo
wrote:
> Are you benchmarking the time of the whole program,
> or of just that snippet? Is the big array out of scope
> after the std.file.read() ? If so, try putting the benchmark
> start and end inside the function, at the same scope
> maybe inside that wcp_whole_file() function.
The empty loop measurement, which was the first benchmark, shows
that the overhead of everything outside the measurement is only
1ms. What I'm measuring is a parallel foreach loop that calls
the small functions on 10 files, with the default number of
threadpool threads, which is 7 threads, based on the
documentation in std.parallelism.
I'm not measuring until end of program or even the console
output. It is all measured with the stopwatch timer, and around
the parallel foreach loop.
>
>
> [....]
>>
>> So here is some surprise ... why is regex 136ms vs 34 ms hand
>> code?
>
> It's not surprising to me. I don't think that there
> is a single regex engine in the world (I don't think even
> the legendary Ken Thompson machine code engine) that can
> surpass a hand coded:
> foreach(c; buffer) { lineCount += (c == '\n'); }
> for line counting.
Well, ok, but I think the comments like below from the std.regex
raise hopes of something approaching hand code.
" //create static regex at compile-time, contains fast native
code
enum ctr = ctRegex!(`^.*/([^/]+)/?$`);
"
I'll put the test code on github this weekend. I still want to
try a few things that have been suggested.
on the chunk measurements ... I don't understand what is not
"fair". In both measurements I processed the entire file.
On the use of larger files ... yes that will be interesting, but
for these current measurements the file reads are only taking on
the order of 30ms for 20MB, which tells me they are already
either being cached by win7, or else by the ssd's cache.
I'll use the article instructions below and put the files being
read into the cache prior to the test, so that the file read
time should be small and consistent relative to the other buffer
processing time inside the loops.
http://us.generation-nt.com/activate-windows-file-caching-tip-tips-tricks-2130881-0.html
Thanks
More information about the Digitalmars-d
mailing list