Streaming transport interfaces: input
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Thu Oct 14 11:22:07 PDT 2010
On 10/14/10 12:56 CDT, Denis Koroskin wrote:
> appendDelim *requires* buffering for to be implemented. No OS provides
> an API to read from a file (be it pipe, socket, whatever) to read up to
> some abstract delimiter. It *always* reads in blocks.
Clear. What may be not so clear is that read(ubyte[] buf) ALSO requires
buffering. Disk I/O comes in fixed buffer sizes (sometimes aligned at
512 bytes or whatever), so ANY protocol that allows the user to set the
maximum bytes to read will require buffering and copying. So how is
appendDelim worse than read?
> As such, if you
> need to read until a delimeter, you need to fetch block to some internal
> buffer, MANUALLY search through it and THEN copy to output string.
And there's no way for the client to efficiently do that.
> I've
> implemented that on top of chunked read interface, and it was 5% faster
> than getline()/getdelim() that GNU libc provides (despite you claming it
> to be "many times faster"). It's not.
Please post your code.
> Buffering requires and additional level of data copying, and this is bad
> for fast I/O.
Agreed. But then you define routines that also requires buffering. How
do you reconcile your own requirement with your own interface?
> If you need fast I/O or must pull that out of the stream
> interface. Otherwise chunked read will be less efficient due to
> additional copies to and from buffers.
>
> On the contrary line-based reading can be implemented on top of the
> chunked read without sacrificing a tiny bit of efficiency.
Except for extra copying.
appendDelim implementation:
1. Low-level read in internal buffers
2. Search for delimiter (assume found for simplicity)
3. Resize user buffer
4. Copy
That's one copy, with the necessary corner cases when the delimiter
isn't found yet etc. (which increase copying ONLY if the buffer is
actually moved when reallocated).
The implementation in your message on 10/13/2010 21:20 CDT:
1. Low-level read in internal buffers
2. Copy from internal buffers into the internal buffer provided by your
ByLine implementation
3. Copy from the internal buffer of ByLine into the user-supplied buffer
That's two copies. Agreed?
Andrei
More information about the Digitalmars-d
mailing list