Streaming library

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Wed Oct 13 14:23:23 PDT 2010


On 10/13/10 16:05 CDT, Denis Koroskin wrote:
> On Thu, 14 Oct 2010 00:19:45 +0400, Andrei Alexandrescu
> <SeeWebsiteForEmail at erdani.org> wrote:
>
>> On 10/13/10 14:02 CDT, Denis Koroskin wrote:
>>> On Wed, 13 Oct 2010 20:55:04 +0400, Andrei Alexandrescu
>>> <SeeWebsiteForEmail at erdani.org> wrote:
>>>> http://www.gnu.org/s/libc/manual/html_node/Buffering-Concepts.html
>>>>
>>>> I don't think streams must mimic the low-level OS I/O interface.
>>>>
>>>
>>> I in contrast think that Streams should be a lowest-level possible
>>> platform-independent abstraction.
>>> No buffering besides what an OS provides, no additional functionality.
>>> If you need to be able to read something up to some character (besides,
>>> what should be considered a new-line separator: \r, \n, \r\n?), this
>>> should be done manually in "byLine".
>>
>> This aggravates client code for the sake of simplicity in a library
>> that was supposed to make streaming easy. I'm not seeing progress.
>>
>
> This library code needs to be put somewhere. I just believe it belongs
> to line-reader, not a generic stream. By putting line reading into a
> stream interface, you want make it more efficient.
>
>>>>> That's because
>>>>> most of the steams are binary streams, and there is no such thing as a
>>>>> "line" in them (e.g. how often do you need to read a line from a
>>>>> SocketStream?).
>>>>
>>>> http://www.opengroup.org/onlinepubs/009695399/functions/isatty.html
>>>>
>>>
>>> These are special cases I don't like. There is no such thing in Windows
>>> anyway.
>>
>> I didn't say I like them. Windows has _isatty:
>> http://msdn.microsoft.com/en-us/library/f4s0ddew(v=VS.80).aspx
>>
>
> I stand corrected. Windows pretends to be Posix compliant, yes, but
> that's a sad story to tell. I don't see why would
>
>>>> You need a line when e.g. you parse a HTML header or a email header or
>>>> an FTP response. Again, if at a low level the transfer occurs in
>>>> blocks, that doesn't mean the API must do the same at all levels.
>>>>
>>>
>>> BSD sockets transmits in blocks. If you need to find a special sequence
>>> in a socket stream, you are forced to fetch a chunk, and manually search
>>> for a needed sequence. My position is that you should do it with an
>>> external predicate (e.g. read until whitespace).
>>
>> Problem is how you set up interfaces to avoid inefficiencies and
>> contortions in the client.
>>
>>>>> I don't think streams should buffer anything either (what an
>>>>> underlying
>>>>> OS I/O API caches should suffice), buffered streams adapters can do
>>>>> that
>>>>> in a stream-independent way (why duplicate code when you can do
>>>>> that as
>>>>> efficiently with external methods?).
>>>>
>>>> Most OS primitives don't give access to their own internal buffers.
>>>> Instead, they ask user code to provide a buffer and transfer data into
>>>> it.
>>>
>>> Right. This is why Stream may not cache.
>>
>> This is a big misunderstanding. If the interface is:
>>
>> size_t read(byte[] buffer);
>>
>> then *I*, the client, need to provide the buffer. It's in client
>> space. This means willing or not I need to do buffering, regardless of
>> whatever internal buffering is going on under the wraps.
>>
>
> Use BufferedStream adapter if you need buffering, and raw streams if you
> do the buffering manually.
> That's the way it's implemented in C#, Java, Tango and many many other
> APIs.
>
>>>> So clearly buffering on the client side is a must.
>>>>
>>>
>>> I don't see how is it implied from above.
>>
>> Please implement an abstraction that given this:
>>
>> interface InputStream
>> {
>> size_t read(ubyte[] buf);
>> }
>>
>> defines a line reader.
>>
>
> I thought we agreed that byLine/byChunk need to do buffering manually
> anyway.
>
> class ByLine
> {
> ubyte[] nextLine()
> {
> ubyte[BUFFER_SIZE] buffer;
> while (!inputStream.endOfStream()) {
> size_t bytesRead = inputStream.read(buffer);
> foreach (i, ubyte c; buffer[0..bytesRead]) {
> if (c != '\n') {
> continue;
> }
>
> appender.put(buffer[0..i]);
> ubyte[] line = appender.data.dup();
> appender.reset();
> appender.put(buffer[i+1..$]);
>
> return line;
> }
>
> appender.put(buffer[0..bytesRead]);
> }
>
> ubyte[] line = appender.data.dup();
> appender.reset();
> return line;
> }
>
> InputStream inputStream;
> Appender!(ubyte[]) appender;
> }
>
> (I've skipped the range interface for the sake of simplicity, replaced
> it with nextLine() function. I also don't remember proper appender
> interface, so I've used imaginary function names).
>
> Once again, what's the point of byLine, if all it does is call
> stream.readLine(); ? That's moving code from one place to many unrelated
> ones. I don't agree with that.
>
> I'm not convinced we need line-based API at core stream level. I don't
> think we need to sacrifice performance for a general case in order to
> avoid performance hit and a special case. who even told you it will be
> any less efficient that way?

The code above.

Andrei


More information about the Digitalmars-d mailing list