Streaming library

Wed Oct 13 14:05:55 PDT 2010

On Thu, 14 Oct 2010 00:19:45 +0400, Andrei Alexandrescu  
<SeeWebsiteForEmail at erdani.org> wrote:

> On 10/13/10 14:02 CDT, Denis Koroskin wrote:
>> On Wed, 13 Oct 2010 20:55:04 +0400, Andrei Alexandrescu
>> <SeeWebsiteForEmail at erdani.org> wrote:
>>> http://www.gnu.org/s/libc/manual/html_node/Buffering-Concepts.html
>>>
>>> I don't think streams must mimic the low-level OS I/O interface.
>>>
>>
>> I in contrast think that Streams should be a lowest-level possible
>> platform-independent abstraction.
>> No buffering besides what an OS provides, no additional functionality.
>> If you need to be able to read something up to some character (besides,
>> what should be considered a new-line separator: \r, \n, \r\n?), this
>> should be done manually in "byLine".
>
> This aggravates client code for the sake of simplicity in a library that  
> was supposed to make streaming easy. I'm not seeing progress.
>

This library code needs to be put somewhere. I just believe it belongs to  
line-reader, not a generic stream. By putting line reading into a stream  
interface, you want make it more efficient.

>>>> That's because
>>>> most of the steams are binary streams, and there is no such thing as a
>>>> "line" in them (e.g. how often do you need to read a line from a
>>>> SocketStream?).
>>>
>>> http://www.opengroup.org/onlinepubs/009695399/functions/isatty.html
>>>
>>
>> These are special cases I don't like. There is no such thing in Windows
>> anyway.
>
> I didn't say I like them. Windows has _isatty:  
> http://msdn.microsoft.com/en-us/library/f4s0ddew(v=VS.80).aspx
>

I stand corrected. Windows pretends to be Posix compliant, yes, but that's  
a sad story to tell. I don't see why would

>>> You need a line when e.g. you parse a HTML header or a email header or
>>> an FTP response. Again, if at a low level the transfer occurs in
>>> blocks, that doesn't mean the API must do the same at all levels.
>>>
>>
>> BSD sockets transmits in blocks. If you need to find a special sequence
>> in a socket stream, you are forced to fetch a chunk, and manually search
>> for a needed sequence. My position is that you should do it with an
>> external predicate (e.g. read until whitespace).
>
> Problem is how you set up interfaces to avoid inefficiencies and  
> contortions in the client.
>
>>>> I don't think streams should buffer anything either (what an  
>>>> underlying
>>>> OS I/O API caches should suffice), buffered streams adapters can do  
>>>> that
>>>> in a stream-independent way (why duplicate code when you can do that  
>>>> as
>>>> efficiently with external methods?).
>>>
>>> Most OS primitives don't give access to their own internal buffers.
>>> Instead, they ask user code to provide a buffer and transfer data into
>>> it.
>>
>> Right. This is why Stream may not cache.
>
> This is a big misunderstanding. If the interface is:
>
> size_t read(byte[] buffer);
>
> then *I*, the client, need to provide the buffer. It's in client space.  
> This means willing or not I need to do buffering, regardless of whatever  
> internal buffering is going on under the wraps.
>

Use BufferedStream adapter if you need buffering, and raw streams if you  
do the buffering manually.
That's the way it's implemented in C#, Java, Tango and many many other  
APIs.

>>> So clearly buffering on the client side is a must.
>>>
>>
>> I don't see how is it implied from above.
>
> Please implement an abstraction that given this:
>
> interface InputStream
> {
>      size_t read(ubyte[] buf);
> }
>
> defines a line reader.
>

I thought we agreed that byLine/byChunk need to do buffering manually  
anyway.

class ByLine
{
	ubyte[] nextLine()
	{
		ubyte[BUFFER_SIZE] buffer;
		while (!inputStream.endOfStream()) {
			size_t bytesRead = inputStream.read(buffer);
			foreach (i, ubyte c; buffer[0..bytesRead]) {
				if (c != '\n') {
					continue;
				}

				appender.put(buffer[0..i]);
				ubyte[] line = appender.data.dup();
				appender.reset();
				appender.put(buffer[i+1..$]);

				return line;
			}

			appender.put(buffer[0..bytesRead]);
		}

		ubyte[] line = appender.data.dup();
		appender.reset();
		return line;
	}

	InputStream inputStream;
	Appender!(ubyte[]) appender;
}

(I've skipped the range interface for the sake of simplicity, replaced it  
with nextLine() function. I also don't remember proper appender interface,  
so I've used imaginary function names).

Once again, what's the point of byLine, if all it does is call  
stream.readLine(); ? That's moving code from one place to many unrelated  
ones. I don't agree with that.

I'm not convinced we need line-based API at core stream level. I don't  
think we need to sacrifice performance for a general case in order to  
avoid performance hit and a special case. who even told you it will be any  
less efficient that way?

>
> Andrei