some regex vs std.ascii vs handcode times

Mon Mar 19 16:37:16 PDT 2012

On Monday, 19 March 2012 at 17:23:36 UTC, Andrei Alexandrescu 
wrote:
> On 3/18/12 11:12 PM, Jay Norwood wrote:
>> I'm timing operations processing 10 2MB text files in 
>> parallel. I
>> haven't gotten to the part where I put the words in the map, 
>> but I've
>> done enough through this point to say a few things about the 
>> measurements.
>
> Great work! This prompts quite a few bug reports and 
> enhancement suggestions - please submit them to bugzilla.

I don't know if they are bugs.  On D.learn I got the explanation 
that the matches.captures.length() just returns the matches in 
the  expressions surrounded by (),  so I don't think this can be 
used ,other than in a for loop, to count lines, for example.  
std.algorithm.count works ok, but I was hoping that there was 
something in the ctRegex that would make it work as fast as the 
hand-coded string scan.

>
> Two quick notes:
>
>> On the other end of the spectrum is the byLine version of the 
>> read. So
>> this is way too slow to be promoting in our examples, and if 
>> anyone is
>> using this in the code you should instead read chunks ... 
>> maybe 1MB like
>> in my example later below, and then split up the lines 
>> yourself.
>>
>> // read files by line ... yikes! don't want to do this
>> //finished! time: 485 ms
>> void wcp_byLine(string fn)
>> {
>> auto f = File(fn);
>> foreach(line; f.byLine(std.string.KeepTerminator.yes)){
>> }
>> }
>
> What OS did you use? (The implementation of byLine varies a lot 
> across OSs.)

I'm doing everything now on win7-64 right now.

>
> I wanted for a long time to improve byLine by allowing it to do 
> its own buffering. That means once you used byLine it's not 
> possible to stop it, get back to the original File, and 
> continue reading it. Using byLine is a commitment. This is what 
> most uses of it do anyway.
>
>> Ok, this was the good surprise. Reading by chunks was faster 
>> than
>> reading the whole file, by several ms.
>
> What may be at work here is cache effects. Reusing the same 1MB 
> may place it in faster cache memory, whereas reading 20MB at 
> once may spill into slower memory.

Yes, I would guess that's the problem. This corei7 has 8MB cache, 
and the threadpool creates 7 active tasks by default, as I 
understand, so even 1MB blocks is on the border when running 
parallel. I'll lower the chunk size to some level that seems 
reasonable and retest.

>
>
> Andrei