some regex vs std.ascii vs handcode times

Juan Manuel Cabo juanmanuel.cabo at gmail.com
Tue Mar 20 22:06:23 PDT 2012


On Monday, 19 March 2012 at 17:23:36 UTC, Andrei Alexandrescu 
wrote:

[.....]

>
> I wanted for a long time to improve byLine by allowing it to do 
> its own buffering. That means once you used byLine it's not 
> possible to stop it, get back to the original File, and 
> continue reading it. Using byLine is a commitment. This is what 
> most uses of it do anyway.

Great!! Perhaps we don't have to choose. We may have both!!
Allow me to suggest:

       byLineBuffered(bufferSize, keepTerminator);
or    byLineOnly(bufferSize, keepTerminator);
or    byLineChunked(bufferSize, keepTerminator);
or    byLineFastAndDangerous :-) hahah :-)

Or the other way around:

       byLine(keepTerminator, underlyingBufferSize);
renaming the current one to:
       byLineUnbuffered(keepTerminator);

Other ideas (I think I read them somewhere about
this same byLine topic):
   * I think it'd be cool if 'line' could be a slice of the
underlying buffer when possible if buffering is added.
   * Another good idea would be a new argument, maxLineLength,
so that one can avoid reading and allocating the whole
file into a big line string if there are no newlines
in the file, and one knows the max length desired.

--jm


>
>> Ok, this was the good surprise. Reading by chunks was faster 
>> than
>> reading the whole file, by several ms.
>
> What may be at work here is cache effects. Reusing the same 1MB 
> may place it in faster cache memory, whereas reading 20MB at 
> once may spill into slower memory.
>
>
> Andrei






More information about the Digitalmars-d mailing list