Empty VS null array?

Tue Oct 22 01:43:29 PDT 2013

On Mon, 21 Oct 2013 17:49:43 +0100, H. S. Teoh <hsteoh at quickfur.ath.cx>  
wrote:

> On Mon, Oct 21, 2013 at 04:47:05PM +0100, Regan Heath wrote:
>> On Mon, 21 Oct 2013 15:02:35 +0100, H. S. Teoh
>> <hsteoh at quickfur.ath.cx> wrote:
>>
>> >On Mon, Oct 21, 2013 at 10:40:14AM +0100, Regan Heath wrote:
>> >>On Fri, 18 Oct 2013 17:36:28 +0100, Dicebot <public at dicebot.lv> wrote:
>> >>
>> >>>On Friday, 18 October 2013 at 15:42:56 UTC, Andrei Alexandrescu  
>> wrote:
>> >>>>That's bad API design, pure and simple. The function should e.g.
>> >>>>return the string including the line terminator, and only return
>> >>>>an empty (or null) string upon EOF.
>> >>>
>> >>>I'd say it should throw upon EOF as it is pretty high-level
>> >>>convenience function.
>> >>
>> >>I disagree.  Exceptions should never be used for flow control so the
>> >>rule is to throw on exceptional occurrences ONLY not on something
>> >>that you will ALWAYS eventually happen.
>> >[...]
>> >
>> >	while (!file.eof) {
>> >		auto line = file.readln(); // never throws
>> >		...
>> >	}
>>
>> For a file this is implementable (without a buffer) but not for a
>> socket or similar source/stream where a read MUST be performed to
>> detect EOF.  So, if you're implementing a line reader over multiple
>> sources, you would need to buffer.  Not the end of the world, but
>> definitely more complicated than just returning a null, no?
> [...]
>
> This is actually a very interesting issue to me, and one which I've
> thought about a lot in the past. There are two incompatible (albeit with
> much overlap) approaches here. One is the Unix approach where EOF is
> unknown until you try to read past the end of a file (socket, etc.), and
> the other is where EOF is known *before* you perform a read.
>
> Personally, I prefer the second approach as being conceptually cleaner:
> an input stream should "know" when it doesn't have any more data, so
> that its EOF state can be queried at any time. Conceptually speaking one
> shouldn't need to (try to) read from it before realizing there's nothing
> left.
>
> However, I understand that the Unix approach is easier to implement, in
> the sense that if you have a network socket, it may be the case that
> when you attempt to read from it, it is still connected, but before any
> further data is received, the remote end disconnects. In this case, the
> OS can't reasonably predict when there will be more incoming data, so
> you do have to read the socket before finding out that the remote end
> is going to disconnect without sending anything more.
>
> In terms of API design, though, I still lean towards the approach where
> EOF is always query-able, because it leads to cleaner code. This can be
> implemented on Posix by having .eof read a single byte (or whatever unit
> is expected) and buffering it, and the subsequent readln() takes this
> buffering into account. This slight complication in implementation is
> worth achieving the nicer user-facing API, IMO.

I don't agree the user-facing API is nicer.  It is more complex both in  
concept and implementation.

API #1: 1 function, readline(), returns null on EOF.  You call readline()  
and check the result for null.  The check, naturally follows the attempt  
to read, which is the task you are trying to accomplish.  Simple, straight  
forward.

API #2: 2 functions, readline() throws on EOF, isEof() checks for EOF.   
Your purpose is to read lines, so you call readline(), it is naturally  
easy to forget to call isEof().  Coding the example loop above requires  
you think about EOF /before/ you read a line, this is not how people  
think.  This API is therefore more complex, and less intuitive for no gain.

So, having a usable null state allows the simpler, more direct API.  Lack  
of it requires a more complicated design and a more complicated  
implementation.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/