Streaming library
Denis Koroskin
2korden at gmail.com
Wed Oct 13 14:05:55 PDT 2010
On Thu, 14 Oct 2010 00:19:45 +0400, Andrei Alexandrescu
<SeeWebsiteForEmail at erdani.org> wrote:
> On 10/13/10 14:02 CDT, Denis Koroskin wrote:
>> On Wed, 13 Oct 2010 20:55:04 +0400, Andrei Alexandrescu
>> <SeeWebsiteForEmail at erdani.org> wrote:
>>> http://www.gnu.org/s/libc/manual/html_node/Buffering-Concepts.html
>>>
>>> I don't think streams must mimic the low-level OS I/O interface.
>>>
>>
>> I in contrast think that Streams should be a lowest-level possible
>> platform-independent abstraction.
>> No buffering besides what an OS provides, no additional functionality.
>> If you need to be able to read something up to some character (besides,
>> what should be considered a new-line separator: \r, \n, \r\n?), this
>> should be done manually in "byLine".
>
> This aggravates client code for the sake of simplicity in a library that
> was supposed to make streaming easy. I'm not seeing progress.
>
This library code needs to be put somewhere. I just believe it belongs to
line-reader, not a generic stream. By putting line reading into a stream
interface, you want make it more efficient.
>>>> That's because
>>>> most of the steams are binary streams, and there is no such thing as a
>>>> "line" in them (e.g. how often do you need to read a line from a
>>>> SocketStream?).
>>>
>>> http://www.opengroup.org/onlinepubs/009695399/functions/isatty.html
>>>
>>
>> These are special cases I don't like. There is no such thing in Windows
>> anyway.
>
> I didn't say I like them. Windows has _isatty:
> http://msdn.microsoft.com/en-us/library/f4s0ddew(v=VS.80).aspx
>
I stand corrected. Windows pretends to be Posix compliant, yes, but that's
a sad story to tell. I don't see why would
>>> You need a line when e.g. you parse a HTML header or a email header or
>>> an FTP response. Again, if at a low level the transfer occurs in
>>> blocks, that doesn't mean the API must do the same at all levels.
>>>
>>
>> BSD sockets transmits in blocks. If you need to find a special sequence
>> in a socket stream, you are forced to fetch a chunk, and manually search
>> for a needed sequence. My position is that you should do it with an
>> external predicate (e.g. read until whitespace).
>
> Problem is how you set up interfaces to avoid inefficiencies and
> contortions in the client.
>
>>>> I don't think streams should buffer anything either (what an
>>>> underlying
>>>> OS I/O API caches should suffice), buffered streams adapters can do
>>>> that
>>>> in a stream-independent way (why duplicate code when you can do that
>>>> as
>>>> efficiently with external methods?).
>>>
>>> Most OS primitives don't give access to their own internal buffers.
>>> Instead, they ask user code to provide a buffer and transfer data into
>>> it.
>>
>> Right. This is why Stream may not cache.
>
> This is a big misunderstanding. If the interface is:
>
> size_t read(byte[] buffer);
>
> then *I*, the client, need to provide the buffer. It's in client space.
> This means willing or not I need to do buffering, regardless of whatever
> internal buffering is going on under the wraps.
>
Use BufferedStream adapter if you need buffering, and raw streams if you
do the buffering manually.
That's the way it's implemented in C#, Java, Tango and many many other
APIs.
>>> So clearly buffering on the client side is a must.
>>>
>>
>> I don't see how is it implied from above.
>
> Please implement an abstraction that given this:
>
> interface InputStream
> {
> size_t read(ubyte[] buf);
> }
>
> defines a line reader.
>
I thought we agreed that byLine/byChunk need to do buffering manually
anyway.
class ByLine
{
ubyte[] nextLine()
{
ubyte[BUFFER_SIZE] buffer;
while (!inputStream.endOfStream()) {
size_t bytesRead = inputStream.read(buffer);
foreach (i, ubyte c; buffer[0..bytesRead]) {
if (c != '\n') {
continue;
}
appender.put(buffer[0..i]);
ubyte[] line = appender.data.dup();
appender.reset();
appender.put(buffer[i+1..$]);
return line;
}
appender.put(buffer[0..bytesRead]);
}
ubyte[] line = appender.data.dup();
appender.reset();
return line;
}
InputStream inputStream;
Appender!(ubyte[]) appender;
}
(I've skipped the range interface for the sake of simplicity, replaced it
with nextLine() function. I also don't remember proper appender interface,
so I've used imaginary function names).
Once again, what's the point of byLine, if all it does is call
stream.readLine(); ? That's moving code from one place to many unrelated
ones. I don't agree with that.
I'm not convinced we need line-based API at core stream level. I don't
think we need to sacrifice performance for a general case in order to
avoid performance hit and a special case. who even told you it will be any
less efficient that way?
>
> Andrei
More information about the Digitalmars-d
mailing list