[phobos] Fwd: [Issue 4025] New: Making network with the std.stdio.File interface

Steve Schveighoffer schveiguy at yahoo.com
Thu Apr 8 13:01:51 PDT 2010





----- Original Message ----
> From: Andrei Alexandrescu <andrei at erdani.com>
> 
> On 04/08/2010 01:23 PM, Steve Schveighoffer wrote:
>> The network 
> socket is not a range, it's a File, and File does have
>> primitives 
> such as rawWrite and rawRead, which we can add to and
>> 
> improve.
>> 
>> File offers ranges, but you're not required to 
> use them.
> 
> That's not what I read from Walter's comment...  
> He indicated that
> something like an e.g. zip library should take a range 
> as input.
> This implies that all streams are shoehorned into range 
> form.

If the zip library works with ranges, we can use it for 
> transparently handling in-memory zip manipulation and also zip file 
> manipulation.

Yes, from a library perspective, everything as a range works well.  The problem is, does the range interface lend itself well to things that need streams, like zip.  Basically, you didn't answer the 'if zip can use ranges' part.  That's the part I'm more concerned about.

>> Makes sense. I'm just a bit worried about stdio's 
> poor buffering
>> interface. It only offers setvbuf(), which is quite 
> opaque.
> 
> The only reason to use FILE * as the underlying 
> implementation is to
> be compatible with C's (f)printf.  It makes 
> sense that you only need
> that compatibility for printing to a standard 
> handle.  I think we can
> probably come up with an abstraction layer 
> that uses FILE* only when
> dealing with standard handles.

It's 
> more than printf. There are several I/O routines in stdio, and all use FILE* for 
> both input and output. If a D application mixes calls to C APIs that do I/O with 
> stdin, stdout, and stderr, we need to take a stance on what should 
> happen.

But I'm saying, the times where we need to intermingle with C are only for the standard handles, it seems that's what you're saying also, but you worded it in a way that makes it sound like you disagree with me...  Confused.

> I don't think that accurately 
> represents what's going on. rawRead does need a fair amount of paraphernalia to 
> work. For example:

// Consume input using rawRead
auto buffer = new 
> ubyte[1024];
size_t read;
while ((read = input.rawRead(buffer).length) 
> > 0) {
   auto usable = buffer[0 .. read];
   ... use usable 
> ...
}

Not that elegant. Compare and contrast with:

// Consume 
> input using a range
foreach (buffer; input.byChunk(1024)) {
    
> ... use buffer ...
}

// Consume input straight from a 
> range
input.bufsize = 1024;
foreach (buffer; input) {
    ... 
> use buffer ...
}

Yes, if your application processes 1024 bytes at a time, it is easier to use a range.  That's not the application I'm referring to.  The application I'm talking about is when you need to read a different amount of bytes per read, such as a varying length packet.  This is not an uncommon situation.

Let's look at that version with your range:

while(!input.empty())
{
   input.bufsize = numtoread;
   input.popFront();
   auto data = input.front();

   // process data.
}

and with File's rawRead:

ubyte buf[MAXSIZE];
ubyte[] data;
while((data = input.rawRead(buf[0..numtoread])).length)
{
   // process data.
}

And look, we can use the stack for buffering!  Plus, we don't have to worry about whether the data buffer will be overwritten, we control what buffer is used by the input object, so we can manage that less defensively.

Also, let's not forget that you can easily bolt an input range interface on top of a file interface (as evidenced by byChunk), but you can't do the opposite.  For example, reading a packet at a time from a network/file stream given a length can easily be implemented with a range on top of a File struct, but not easily with a range on top of a range.

> // read N bytes
> 
> source.bufsize = N;
> auto data = source.front();
> 
> source.popFront();

I think it's more often to want to consume stuff in a 
> stream manner, as opposed to attempting to read some isolated bits. Ranges are 
> optimized for the former.

So essentially, the idea is to double-buffer the data, once inside the range (to support the front/popFront regime) and once for your application, so you can build up enough "chunks" to read the data correctly?  I don't see how this moves us towards high performance.  One litmus test for this is, if whatever we come up with uses more than one buffer, it is not good enough.

> We need to figure 
> out all this stuff together, but so far I'm not at all convinced that seekable 
> ranges are awkward.

I may not have explained myself well, I don't have a big problem with seekable ranges for certain applications, I just don't think they are the primitive that should be used for all applications.

-Steve



      


More information about the phobos mailing list