some regex vs std.ascii vs handcode times

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Mon Mar 19 10:23:36 PDT 2012


On 3/18/12 11:12 PM, Jay Norwood wrote:
> I'm timing operations processing 10 2MB text files in parallel. I
> haven't gotten to the part where I put the words in the map, but I've
> done enough through this point to say a few things about the measurements.

Great work! This prompts quite a few bug reports and enhancement 
suggestions - please submit them to bugzilla.

Two quick notes:

> On the other end of the spectrum is the byLine version of the read. So
> this is way too slow to be promoting in our examples, and if anyone is
> using this in the code you should instead read chunks ... maybe 1MB like
> in my example later below, and then split up the lines yourself.
>
> // read files by line ... yikes! don't want to do this
> //finished! time: 485 ms
> void wcp_byLine(string fn)
> {
> auto f = File(fn);
> foreach(line; f.byLine(std.string.KeepTerminator.yes)){
> }
> }

What OS did you use? (The implementation of byLine varies a lot across OSs.)

I wanted for a long time to improve byLine by allowing it to do its own 
buffering. That means once you used byLine it's not possible to stop it, 
get back to the original File, and continue reading it. Using byLine is 
a commitment. This is what most uses of it do anyway.

> Ok, this was the good surprise. Reading by chunks was faster than
> reading the whole file, by several ms.

What may be at work here is cache effects. Reusing the same 1MB may 
place it in faster cache memory, whereas reading 20MB at once may spill 
into slower memory.


Andrei


More information about the Digitalmars-d mailing list