buffered input

Sun Feb 6 07:11:18 PST 2011

On 2/6/11 3:22 EST, Jonathan M Davis wrote:
> Okay. I think that I've been misunderstanding some stuff here. I forgot that we
> were dealing with input ranges rather than forward ranges, and many range
> functions just don't work with input ranges, since they lack save(). Bleh.
>
> Okay. Honestly, what I'd normally want to be dealing with when reading a stream
> or file is a buffered forward range which is implemented in a manner which
> minimized copies. Having to deal with a input range, let alone what Andrei is
> suggesting here would definitely be annoying to say the least.
>
> Couldn't we do something which created a new buffer each time that it read in
> data from a file, and then it could be a forward range with infinite look-ahead.
> The cost of creating a new buffer would likely be minimal, if not outright
> neglible, in comparison to reading in the data from a file, and having multiple
> buffers would allow it to be a forward range. Perhaps, the creation of a new
> buffer could even be skipped if save had never been called and therefore no
> external references to the buffer would exist - at least as long as we're talking
> about bytes or characters or other value types.

APIs predicated on the notion that I/O is very expensive and extra 
overheads are not measurable have paid dearly for it (e.g. C++'s iostreams).

> Maybe there's some major flaw in that basic idea. I don't know. But Andrei's
> suggestion sounds like a royal pain for basic I/O. If that's all I had to deal
> with when trying to lazily read in a file and process it, I'd just use readText()
> instead, since it would just be way easier to use.

Clearly reading the entire file in an in-memory structure simplifies 
things. But the proposed streaming interface is extremely convenient as 
it always was; the two added APIs help people who need extra flexibility 
without hurting efficiency.

If you want to read a file in Java: 
http://www.java-tips.org/java-se-tips/java.io/how-to-read-file-in-java.html

In C (with many caveats): http://www.phanderson.com/files/file_read.html

In D:

foreach (line; File("name").byLine()) {
    ...
}

I plan to add a simpler API:

foreach (line; File.byLine("name")) {
    ...
}

To read fixed-sized chunks, use byChunk. This covers the vast majority 
of file I/O needs.

There are two limitations of the current APIs:

1. You can't add a new line to the existing line (or a buffer to the 
existing buffer) if you sometimes want to process multiple lines as a 
logical unit (some programs and file formats need that, as well as 
composing streams).

2. You can't comfortably read data of user-specified size if that size 
varies. This is the case for e.g. binary formats where you need to read 
"doped chunks", i.e. chunks prefixed by their lengths.

My proposal addresses 1 and makes 2 possible.

Andrei