Empty VS null array?

H. S. Teoh hsteoh at quickfur.ath.cx
Mon Oct 21 09:49:43 PDT 2013


On Mon, Oct 21, 2013 at 04:47:05PM +0100, Regan Heath wrote:
> On Mon, 21 Oct 2013 15:02:35 +0100, H. S. Teoh
> <hsteoh at quickfur.ath.cx> wrote:
> 
> >On Mon, Oct 21, 2013 at 10:40:14AM +0100, Regan Heath wrote:
> >>On Fri, 18 Oct 2013 17:36:28 +0100, Dicebot <public at dicebot.lv> wrote:
> >>
> >>>On Friday, 18 October 2013 at 15:42:56 UTC, Andrei Alexandrescu wrote:
> >>>>That's bad API design, pure and simple. The function should e.g.
> >>>>return the string including the line terminator, and only return
> >>>>an empty (or null) string upon EOF.
> >>>
> >>>I'd say it should throw upon EOF as it is pretty high-level
> >>>convenience function.
> >>
> >>I disagree.  Exceptions should never be used for flow control so the
> >>rule is to throw on exceptional occurrences ONLY not on something
> >>that you will ALWAYS eventually happen.
> >[...]
> >
> >	while (!file.eof) {
> >		auto line = file.readln(); // never throws
> >		...
> >	}
> 
> For a file this is implementable (without a buffer) but not for a
> socket or similar source/stream where a read MUST be performed to
> detect EOF.  So, if you're implementing a line reader over multiple
> sources, you would need to buffer.  Not the end of the world, but
> definitely more complicated than just returning a null, no?
[...]

This is actually a very interesting issue to me, and one which I've
thought about a lot in the past. There are two incompatible (albeit with
much overlap) approaches here. One is the Unix approach where EOF is
unknown until you try to read past the end of a file (socket, etc.), and
the other is where EOF is known *before* you perform a read.

Personally, I prefer the second approach as being conceptually cleaner:
an input stream should "know" when it doesn't have any more data, so
that its EOF state can be queried at any time. Conceptually speaking one
shouldn't need to (try to) read from it before realizing there's nothing
left.

However, I understand that the Unix approach is easier to implement, in
the sense that if you have a network socket, it may be the case that
when you attempt to read from it, it is still connected, but before any
further data is received, the remote end disconnects. In this case, the
OS can't reasonably predict when there will be more incoming data, so
you do have to read the socket before finding out that the remote end
is going to disconnect without sending anything more.

In terms of API design, though, I still lean towards the approach where
EOF is always query-able, because it leads to cleaner code. This can be
implemented on Posix by having .eof read a single byte (or whatever unit
is expected) and buffering it, and the subsequent readln() takes this
buffering into account. This slight complication in implementation is
worth achieving the nicer user-facing API, IMO.


T

-- 
I've been around long enough to have seen an endless parade of magic new
techniques du jour, most of which purport to remove the necessity of
thought about your programming problem.  In the end they wind up
contributing one or two pieces to the collective wisdom, and fade away
in the rearview mirror. -- Walter Bright


More information about the Digitalmars-d mailing list