buffered input

Sun Feb 6 00:22:08 PST 2011

On Saturday 05 February 2011 12:57:21 Jonathan M Davis wrote:
> On Saturday 05 February 2011 07:16:45 Andrei Alexandrescu wrote:
> > On 2/5/11 5:09 AM, Jonathan M Davis wrote:
> > > Hmm. I think that I'd have to have an actual implementation to mess
> > > around with to say much. My general take on buffered input is that I
> > > don't want to worry about it. I want it to be buffered so that it's
> > > more efficient, but I don't want to have to care about it in how I use
> > > it. I would have expected a buffered input range to be exactly the
> > > same as an input range except that it doesn't really just pull in one
> > > character behind the scenes. It pulls in 1024 or whatever when
> > > popFront() would result in the end of the buffer being reached, and
> > > you just get the first one with front. The API doesn't reflect the
> > > fact that it's buffered at all except perhaps in how you initialize it
> > > (by telling how big the buffer is, though generally I don't want to
> > > have to care about that either).
> > 
> > Transparent buffering sounds sensible but in fact it robs you of
> > important capabilities. It essentially forces you to use grammars with
> > lookahead 1 for all input operations. Being able to peek forward into
> > the stream without committing to read from it allows you to e.g. do
> > operations like "does this stream start with a specific word" etc. As
> > soon
> 
> The thing is though that if I want to be iterating over a string which is
> buffered (from a file or stream or whatever), I want front to be
> immutable(char) or char, not immutable(char)[] or char[]. I can see how
> having an interface which allows startsWith to efficiently check whether
> the buffered string starts with a particular string makes good sense, but
> generally, as far as I'm concerned, that's startsWith's problem. How would
> I even begin to use a buffered range of string[] as a string?
> 
> Normally, when I've used buffered anything, it's been purely for efficiency
> reasons. All I've cared about is having a stream or file or whatever. The
> fact that reading it from the file (or wherever it came from) in a
> buffered manner is more efficient means that I want it buffered, but that
> hasn't had any effect on how I've used it. If I want x characters from the
> file, I ask for x characters. It's the buffered object's problem how many
> reads that does or doesn't do.
> 
> You must be thinking of a use case which I don't normal think of or am not
> aware of. In my experience, buffering has always been an implementation
> detail that you use because it's more efficient, but you don't worry about
> it beyond creating a buffered stream rather than an unbuffered one.

Okay. I think that I've been misunderstanding some stuff here. I forgot that we 
were dealing with input ranges rather than forward ranges, and many range 
functions just don't work with input ranges, since they lack save(). Bleh.

Okay. Honestly, what I'd normally want to be dealing with when reading a stream 
or file is a buffered forward range which is implemented in a manner which 
minimized copies. Having to deal with a input range, let alone what Andrei is 
suggesting here would definitely be annoying to say the least.

Couldn't we do something which created a new buffer each time that it read in 
data from a file, and then it could be a forward range with infinite look-ahead. 
The cost of creating a new buffer would likely be minimal, if not outright 
neglible, in comparison to reading in the data from a file, and having multiple 
buffers would allow it to be a forward range. Perhaps, the creation of a new 
buffer could even be skipped if save had never been called and therefore no 
external references to the buffer would exist - at least as long as we're talking 
about bytes or characters or other value types.

Maybe there's some major flaw in that basic idea. I don't know. But Andrei's 
suggestion sounds like a royal pain for basic I/O. If that's all I had to deal 
with when trying to lazily read in a file and process it, I'd just use readText() 
instead, since it would just be way easier to use. But that's not exactly ideal, 
because it doesn't work well with large files. Maybe Andrei's idea is great to 
have and maybe it _should_ be in Phobos, but I really think that we need a 
higher level abstraction that makes a stream into a forward range so that it's 
actually simple to use buffered I/O. As efficient as Andrei's suggestion may be, it 
sure sounds like a royal pain to use - especially in comparison to readText().

So, maybe I'm still misunderstanding or missing something here, but what _I_ 
want to see for I/O streams is a _forward_ range which is buffered and which 
reads in the file or whatever the data comes from in a lazy manner. The more I 
think about it, the less I like input ranges. They're just so painfully 
restrictive. They may be necessary at times, but I'd _much_ prefer to deal with 
forward ranges.

On a related note, perhaps we should add a function to some subset of ranges 
which is called something like frontN() and returns a T[] (or perhaps a range 
over type T) when the range is a range over type T. That way, you could grab a 
whole chunk at once without taking it out of the range or having to process the 
range element by element. It wouldn't even need to be a random access range. It 
would just need enough lookahead to grab the first n elements.

So, I don't know what the best solution to this problem is, but I'd _really_ 
like one which makes buffered I/O _simple_, and while Andrei's solution may be a 
great building block, it is _not_ simple.

- Jonathan M Davis