Streaming library
Daniel Gibson
metalcaedes at gmail.com
Wed Oct 13 13:10:00 PDT 2010
Denis Koroskin schrieb:
> On Wed, 13 Oct 2010 20:55:04 +0400, Andrei Alexandrescu
> <SeeWebsiteForEmail at erdani.org> wrote:
>
>> On 10/13/10 11:16 CDT, Denis Koroskin wrote:
>>> On Wed, 13 Oct 2010 18:32:15 +0400, Andrei Alexandrescu
>>>> So far so good. I will point out, however, that the classic read/write
>>>> routines are not all that good. For example if you want to implement a
>>>> line-buffered stream on top of a block-buffered stream you'll be
>>>> forced to write inefficient code.
>>>>
>>>
>>> Never heard of filesystems that allow reading files in lines - they
>>> always read in blocks, and that's what streams should do.
>>
>> http://www.gnu.org/s/libc/manual/html_node/Buffering-Concepts.html
>>
>> I don't think streams must mimic the low-level OS I/O interface.
>>
>
> I in contrast think that Streams should be a lowest-level possible
> platform-independent abstraction.
> No buffering besides what an OS provides, no additional functionality.
> If you need to be able to read something up to some character (besides,
> what should be considered a new-line separator: \r, \n, \r\n?), this
> should be done manually in "byLine".
>
Platform-independent? OS-Independent, yes. But being independent of Endianess and availability of
80bit real etc is to much for a simple stream (of course we'd need an EndianStream that can wrap a
simple stream and take care of the endianess).
>>> That's because
>>> most of the steams are binary streams, and there is no such thing as a
>>> "line" in them (e.g. how often do you need to read a line from a
>>> SocketStream?).
>>
>> http://www.opengroup.org/onlinepubs/009695399/functions/isatty.html
>>
>
> These are special cases I don't like. There is no such thing in Windows
> anyway.
>
>> You need a line when e.g. you parse a HTML header or a email header or
>> an FTP response. Again, if at a low level the transfer occurs in
>> blocks, that doesn't mean the API must do the same at all levels.
>>
>
> BSD sockets transmits in blocks. If you need to find a special sequence
> in a socket stream, you are forced to fetch a chunk, and manually search
> for a needed sequence. My position is that you should do it with an
> external predicate (e.g. read until whitespace).
>
>>> I don't think streams should buffer anything either (what an underlying
>>> OS I/O API caches should suffice), buffered streams adapters can do that
>>> in a stream-independent way (why duplicate code when you can do that as
>>> efficiently with external methods?).
>>
>> Most OS primitives don't give access to their own internal buffers.
>> Instead, they ask user code to provide a buffer and transfer data into
>> it.
>
> Right. This is why Stream may not cache.
>
Simple streams should not cache, but there must be a BufferedStream wrapping simple streams.
When you read from a non-buffered SocketStream each read() (like readInt()) is a syscall - that's
really expensive.
In my project I got a speedup of about factor 4-5 by replacing std.Streams SocketStream with a
custom BufferedSocketStream. I have to do further testing, but I think that shifted the bottleneck
from socket-I/O to something else, so in other cases the speedup may be even bigger.
>> So clearly buffering on the client side is a must.
>>
>
> I don't see how is it implied from above.
>
>>> Besides, as you noted, the buffering is redundant for byChunk/byLine
>>> adapter ranges. It means that byChunk/byLine should operate on
>>> unbuffered streams.
>>
>> Chunks keep their own buffer so indeed they could operate on streams
>> that don't do additional buffering. The story with lines is a fair
>> amount more complicated if it needs to be done efficiently.
>>
>
> Yes. But line-reading is a case that I don't see a need to be handled
> specially.
>
>>> I'll explain my I/O streams implementation below in case you didn't read
>>> my message (I've changed some stuff a little since then).
>>
>> Honest, I opened it to remember to read it but somehow your fonts are
>> small and make my eyes hurt.
>>
>>> My Stream
>>> interface is very simple:
>>>
>>> // A generic stream
>>> interface Stream
>>> {
>>> @property InputStream input();
>>> @property OutputStream output();
>>> @property SeekableStream seekable();
>>> @property bool endOfStream();
>>> void close();
>>> }
>>>
>>> You may ask, why separate Input and Output streams?
>>
>> I think my first question is: why doesn't Stream inherit InputStream
>> and OutputStream? My hypothesis: you want to sometimes return null. Nice.
>>
>
> Right.
>
>>> Well, that's because
>>> you either read from them, write from them, or both.
>>> Some streams are read-only (think Stdin), some write-only (Stdout), some
>>> support both, like FileStream. Right?
>>
>> Sounds good. But then where's flush()? Must be in OutputStream.
>>
>
> That's probably because unbuffered streams don't need them.
You may need to tell the OS to flush its buffer (fsync()).
>
>>
>> I'm surprised there's no flush().
>>
>
> No buffering - no flush.
see above
Cheers,
- Daniel
More information about the Digitalmars-d
mailing list