Streaming transport interfaces: input

Thu Oct 14 11:14:50 PDT 2010

On Thu, 14 Oct 2010 13:39:03 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail at erdani.org> wrote:

> On 10/14/10 12:27 CDT, Steven Schveighoffer wrote:
>> On Thu, 14 Oct 2010 11:34:12 -0400, Andrei Alexandrescu
>> <SeeWebsiteForEmail at erdani.org> wrote:
>> Please, use the term "seek", and allow an anchor. Every OS allows this,
>> it makes no sense not to provide it.
>
> I've always thought that's a crappy appendix. Every OS that ever allows  
> seek/tell with anchors allows ALL anchors, and always allows either both  
> or none of seek and tell. So I decided to cut through the crap and  
> simplify. You want to seek 100 bytes from here, you write  
> stream.position = stream.position + 100.

Um.. yuck.   We need to use two system calls to seek 100 bytes?

>
> Oh, that reminds me I need to provide length as a property as well. This  
> would save us crap like seek(0, SEEK_END); ftell() to figure out the  
> length of a file.

So now you need to do stream.position = stream.length to seek to the end  
of the file instead of stream.seek(0, Anchor.END)?  Plus, how will you  
implement length, probably like this:

auto curpos = seek(0, SEEK_CUR);
auto len = seek(0, SEEK_END);
seek(curpos, SEEK_BEG);
return len;

So that looks like 3 system calls instead of one, plus you just wasted  
time seeking back to the current position.

Now, I do like the syntax better, but we are not going for syntax here, we  
are going for performance.  If you can find a way to merge the two, I'd be  
happy with it.

> I have no sympathy for seek and tell with anchors.

Sympathy has nothing to do with it.  It's the simple fact that you have to  
deal with the OS, and munging your interface on top of it means more  
system calls and less performance.

>> I don't like appendDelim. We don't need to define that until we have
>> buffering.
>
> Why?

Because appendDelim deals with buffering.  If I defined a buffered stream,  
I'd include a function like this:

size_t read(bool delegate(T[] data) sink);

which buffers data until sink returned false (passing each read chunk into  
sink), extending the buffer as necessary.

Then it's trivial to implement readDelim on top of this.

>
>> The simple function of an input stream is to read data.
>
> It does read data.

I mean, that's *all* it should do.  It should not be appending to buffers.

>
>> With
>> buffering you get all the goodies that you want, but the buffer should
>> be in control of its data buffer.
>
> I think the appendDelim method allows fast and simple implementations of  
> a variety of patterns. As I (thought I) have shown elsethread, without  
> appendDelim there's no way to efficiently implement a line-oriented  
> stream on top of a block-oriented one.

Um... the read system call is the same interface as the proposed  
block-oriented interface.  How are you avoiding using system calls?

>
>> Basically, appendDelim can be defined outside this class, because the
>> primitive read is enough.
>
> You can only define it if you accept extra copying. I'd say one extra  
> interface function is acceptable for fast I/O.

No, you can define it without extra copying.  If you don't allow direct  
access to the buffer, then you have extra copying.  But we don't have to  
mimic C here.  We should not be encouraging constant reinventing of the  
buffer wheel here.  Buffering is a well-defined task that can be  
implemented once.

Just as a note, Tango does this, and it's very fast.  There is certainly  
no extra copying there.

>> Shouldn't the text transport be defined on top of the binary transport?
>
> No, because there are transports that genuinely do not accept binary  
> data.

I mean, a text transport uses a binary transport underneath.  What text  
transport doesn't use a binary transport to do its dirty work?  And what  
exactly does a text transport do so differently that it needs to be a  
separate interface?

In other words, if 90% of the text transport duplicates the binary  
transport, I see an opportunity for consolidation.

>> And in any case, I'd expect buffering to go between the two.
>
> How do you define buffering? Would a buffered transport implement a  
> different interface?

Yes, but if we expect to reuse code, I'd expect a buffered transport to  
use a primitive transport underneath for actually reading/writing data.   
If you have multiple versions of the class that actually reads/writes data  
(such as binary vs. text), then the buffer which uses it must support all  
of them.

Text based processing to me seems to be a buffered activity (reading  
lines, ensuring you don't have sliced utf-8 data, etc.).

>> If all you
>> are adding are the different widths of characters, I don't think you
>> need this extra layer. It's going to make the buffering layer more
>> difficult to implement (now it must handle both a text version and
>> abinary version).
>
> I don't understand this.

buffer uses a transport.  If you have two different transport interfaces,  
the buffer must support them both.  And if the benefit is, one simply  
defines [w|d]char versions of read, then we haven't gained much for the  
trouble of having to support both.

-Steve