An IO Streams Library

Sun Feb 7 17:04:43 PST 2016

On Sunday, 7 February 2016 at 10:50:24 UTC, Johannes Pfau wrote:
> I saw this on code.dlang.org some time ago and had a quick 
> look. First of all this would have to go into phobos to make 
> sure it's used as some kind of a standard. Conflicting stream 
> libraries would only cause more trouble.
>
> Then if you want to go for phobos inclusion I'd recommend 
> looking at
> other stream implementations and learning from their mistakes 
> ;-)
> There's
> https://github.com/schveiguy/phobos/tree/babe9fe338f03cafc0fb50fc0d37ea96505da3e3/std/io
> which was supposed to be a stream replacement for phobos. Then 
> there
> are also vibe.d streams*.

I saw Steven's stream implementation quite some time ago and I 
had a look at vibe's stream implementation just now. I think it 
is a mistake to use classes over structs for this sort of thing. 
I briefly tried implementing it with classes, but ran into 
problems. The non-deterministic destruction of classes is 
probably the biggest issue. One has to be careful about calling 
f.close() in order to avoid accumulating too many open file 
descriptors in programs that open a lot of files. Reference 
counting takes care of this problem nicely and has less overhead. 
This is one area where classes relying on the GC is not ideal. 
Rust's ownership system solves this problem quite well. Python 
also solves this with "with" statements.

> Your Stream interfaces looks like standard stream 
> implementations (which
> is a good thing) which also work for unbuffered streams. I 
> think it's a
> good idea to support partial reads and writes. For an 
> explanation why
> partial reads, see the vibe.d rant below. Partial writes are 
> useful
> as a write syscall can be interrupted by posix signals to stop 
> the
> write. I'm not sure if the API should expose this feature (e.g. 
> by
> returning a partial write on EINTR) but it can sometimes be 
> useful.

I don't want to assume what the user wants to do in the event of 
an EINTR unless a certain behavior is desired 100% of the time. I 
don't think that is the case here. Thus, that is probably 
something the user should handle manually, if needed.

> Still readExactly / writeAll helpers functions are useful. I 
> would try
> to implement these as UFCS functions instead of as a struct 
> wrapper.

I agree. I went ahead and made that change.

> For some streams you'll need a TimeoutException. An interesting
> question is whether users should be able to recover from
> TimeoutExceptions. This essentially means if a read/write 
> function
> internally calls read/write posix calls more than once and only 
> the
> last one timed out, we already processed some data and it's not
> possible to recover from a TimeoutException if the amount of 
> already
> processed data is unknown.
> The simplest solution is using only one syscall internally. Then
> TimeoutException => no data was processed. But this doesn't 
> work for
> read/writeExcatly (Another reason why read/writeExactly 
> shouldn't be
> the default. vibe.d...)

In the current implementation of readExactly/writeExactly, one 
cannot assume how much was read or written in the event of an 
exception anyway. The only way around this I can see is to return 
the number of bytes read/written in the exception itself. In 
fact, that might solve the TimeoutException problem, too. Hmm...

I'd like to keep the fundamental read/write functions at just one 
system call each in order to guarantee that they are atomic in 
relation to each other.

> Regarding buffers / sliding windows I'd have a look at 
> https://github.com/schveiguy/phobos/blob/babe9fe338f03cafc0fb50fc0d37ea96505da3e3/std/io/buffer.d
>
> Another design question is whether there should be an interface 
> for such buffered streams or whether it's OK to have only 
> unbuffered streams + one buffer struct / class. Basically the 
> question is whether there might be streams that can offer a 
> buffer interface but can't  use the standard implementation.

I think it's OK to re-implement buffering for different types of 
streams where it is more efficient to do so. For example, there 
is no need to implement buffering for an in-memory stream 
because, by definition, it is already buffered.

I'm not sure if having multiple buffering strategies would be 
useful. Right now, there is only the fixed-sized sliding window. 
If multiple buffering strategies are useful, then it makes sense 
to have all streams unbuffered by default and have separate 
buffering implementations.

There is an interesting buffering approach here that is mainly 
geared towards parsing: 
https://github.com/DmitryOlshansky/datapicked/blob/master/dpick/buffer/buffer.d

> * vibe.d stream rant ahead:
>
> vibe.d streams get some things right and some things very 
> wrong. For
> example their leastSize/empty/read combo means you might 
> actually
> have to implement reading data in any of these functions. Users 
> have to
> handle timeouts or other errors for any of these as well.
>
> Then the API requires a buffered stream, it simply won't work 
> for
> unbuffered IO (leastSize, empty). And the fact that read reads 
> exactly
> n bytes makes stream implementations more complicated 
> (re-reading until
> enough data has been read should be done by a generic function, 
> not
> reimplemented in every stream). It even makes some user code 
> more
> complicated: I've implemented a serial port library for vibe-d.
> If I don't know how many bytes will arrive with the next 
> packet, the
> read posix function usually returns the expected/available 
> amount of
> data. But now vibe.d requires me to specify a fixed length when 
> calling
> the stream read method. This leads to ugly code using peak...
>
> Then vibe.d also mixes the sliding window / buffer concept into 
> the stream class, but does so in a bad way. A sliding window 
> should expose the internal buffer so that it's possible to 
> consume bytes from the buffer, skip bytes, refill... In vibe.d 
> you can peak at the buffer. But you can't discard data. You'll 
> have to call read instead which copies from the internal buffer 
> to an external buffer, even if you only want to skip data. Even 
> worse, your external buffer size is limited. So you have to 
> implement some loop logic if you want to skip more data than 
> fits your buffer. And all you need is a discard(size_t n) 
> function which does _buffer = _buffer[n .. $] in the stream 
> class...

These are the golden nuggets of experience I was looking for when 
making this post. They definitely help to guide an ergonomic API 
design. Standing on the shoulders of giants and such. Thanks!

> TLDR: API design is very important.

Completely agree.