some regex vs std.ascii vs handcode times

Jay Norwood jayn at prismnet.com
Wed Mar 21 21:29:40 PDT 2012


On Wednesday, 21 March 2012 at 05:49:29 UTC, Juan Manuel Cabo 
wrote:

>   Are you benchmarking the time of the whole program,
> or of just that snippet? Is the big array out of scope
> after the std.file.read() ? If so, try putting the benchmark
> start and end inside the function, at the same scope
> maybe inside that wcp_whole_file() function.

The empty loop measurement, which was the first benchmark, shows 
that the overhead of everything outside the measurement is only 
1ms.  What I'm measuring is a parallel foreach loop that calls 
the small functions on 10 files, with the default number of 
threadpool threads, which is 7 threads, based on the 
documentation in std.parallelism.

I'm not measuring until end of program or even the console 
output.  It is all measured with the stopwatch timer, and around 
the parallel foreach loop.

>
>
> [....]
>>
>> So here is some surprise ... why is regex 136ms vs 34 ms hand 
>> code?
>
> It's not surprising to me. I don't think that there
> is a single regex engine in the world (I don't think even
> the legendary Ken Thompson machine code engine) that can
> surpass a hand coded:
>      foreach(c; buffer) { lineCount += (c == '\n'); }
> for line counting.

Well, ok, but I think the comments like below from the std.regex 
raise hopes of something approaching hand code.

"  //create static regex at compile-time, contains fast native 
code
   enum ctr = ctRegex!(`^.*/([^/]+)/?$`);
"

I'll put the test code on github this weekend.  I still want to 
try a few things that have been suggested.

on the chunk measurements ... I don't understand what is not 
"fair". In both measurements I processed the entire file.


On the use of larger files ... yes that will be interesting, but 
for these current measurements  the file reads are only taking on 
the order of 30ms for 20MB, which tells me they are already 
either being cached by win7, or else by the ssd's cache.

  I'll use the article instructions below and put the files being 
read into the cache prior to the test,  so that the file read 
time  should be small and consistent relative to the other buffer 
processing time inside the loops.

http://us.generation-nt.com/activate-windows-file-caching-tip-tips-tricks-2130881-0.html


Thanks






More information about the Digitalmars-d mailing list