Streaming library

Wed Oct 13 13:10:00 PDT 2010

Denis Koroskin schrieb:
> On Wed, 13 Oct 2010 20:55:04 +0400, Andrei Alexandrescu 
> <SeeWebsiteForEmail at erdani.org> wrote:
> 
>> On 10/13/10 11:16 CDT, Denis Koroskin wrote:
>>> On Wed, 13 Oct 2010 18:32:15 +0400, Andrei Alexandrescu
>>>> So far so good. I will point out, however, that the classic read/write
>>>> routines are not all that good. For example if you want to implement a
>>>> line-buffered stream on top of a block-buffered stream you'll be
>>>> forced to write inefficient code.
>>>>
>>>
>>> Never heard of filesystems that allow reading files in lines - they
>>> always read in blocks, and that's what streams should do.
>>
>> http://www.gnu.org/s/libc/manual/html_node/Buffering-Concepts.html
>>
>> I don't think streams must mimic the low-level OS I/O interface.
>>
> 
> I in contrast think that Streams should be a lowest-level possible 
> platform-independent abstraction.
> No buffering besides what an OS provides, no additional functionality. 
> If you need to be able to read something up to some character (besides, 
> what should be considered a new-line separator: \r, \n, \r\n?), this 
> should be done manually in "byLine".
> 

Platform-independent? OS-Independent, yes. But being independent of Endianess and availability of 
80bit real etc is to much for a simple stream (of course we'd need an EndianStream that can wrap a 
simple stream and take care of the endianess).

>>> That's because
>>> most of the steams are binary streams, and there is no such thing as a
>>> "line" in them (e.g. how often do you need to read a line from a
>>> SocketStream?).
>>
>> http://www.opengroup.org/onlinepubs/009695399/functions/isatty.html
>>
> 
> These are special cases I don't like. There is no such thing in Windows 
> anyway.
> 
>> You need a line when e.g. you parse a HTML header or a email header or 
>> an FTP response. Again, if at a low level the transfer occurs in 
>> blocks, that doesn't mean the API must do the same at all levels.
>>
> 
> BSD sockets transmits in blocks. If you need to find a special sequence 
> in a socket stream, you are forced to fetch a chunk, and manually search 
> for a needed sequence. My position is that you should do it with an 
> external predicate (e.g. read until whitespace).
> 
>>> I don't think streams should buffer anything either (what an underlying
>>> OS I/O API caches should suffice), buffered streams adapters can do that
>>> in a stream-independent way (why duplicate code when you can do that as
>>> efficiently with external methods?).
>>
>> Most OS primitives don't give access to their own internal buffers. 
>> Instead, they ask user code to provide a buffer and transfer data into 
>> it.
> 
> Right. This is why Stream may not cache.
> 

Simple streams should not cache, but there must be a BufferedStream wrapping simple streams.
When you read from a non-buffered SocketStream each read() (like readInt()) is a syscall - that's 
really expensive.
In my project I got a speedup of about factor 4-5 by replacing std.Streams SocketStream with a 
custom BufferedSocketStream. I have to do further testing, but I think that shifted the bottleneck 
from socket-I/O to something else, so in other cases the speedup may be even bigger.

>> So clearly buffering on the client side is a must.
>>
> 
> I don't see how is it implied from above.
> 
>>> Besides, as you noted, the buffering is redundant for byChunk/byLine
>>> adapter ranges. It means that byChunk/byLine should operate on
>>> unbuffered streams.
>>
>> Chunks keep their own buffer so indeed they could operate on streams 
>> that don't do additional buffering. The story with lines is a fair 
>> amount more complicated if it needs to be done efficiently.
>>
> 
> Yes. But line-reading is a case that I don't see a need to be handled 
> specially.
> 
>>> I'll explain my I/O streams implementation below in case you didn't read
>>> my message (I've changed some stuff a little since then).
>>
>> Honest, I opened it to remember to read it but somehow your fonts are 
>> small and make my eyes hurt.
>>
>>> My Stream
>>> interface is very simple:
>>>
>>> // A generic stream
>>> interface Stream
>>> {
>>> @property InputStream input();
>>> @property OutputStream output();
>>> @property SeekableStream seekable();
>>> @property bool endOfStream();
>>> void close();
>>> }
>>>
>>> You may ask, why separate Input and Output streams?
>>
>> I think my first question is: why doesn't Stream inherit InputStream 
>> and OutputStream? My hypothesis: you want to sometimes return null. Nice.
>>
> 
> Right.
> 
>>> Well, that's because
>>> you either read from them, write from them, or both.
>>> Some streams are read-only (think Stdin), some write-only (Stdout), some
>>> support both, like FileStream. Right?
>>
>> Sounds good. But then where's flush()? Must be in OutputStream.
>>
> 
> That's probably because unbuffered streams don't need them.

You may need to tell the OS to flush its buffer (fsync()).

> 
>>
>> I'm surprised there's no flush().
>>
> 
> No buffering - no flush.

see above

Cheers,
- Daniel