Empty VS null array?

Mon Oct 21 04:28:43 PDT 2013

On Mon, 21 Oct 2013 11:58:07 +0100, Regan Heath <regan at netmail.co.nz>  
wrote:

> On Fri, 18 Oct 2013 20:58:07 +0100, H. S. Teoh <hsteoh at quickfur.ath.cx>  
> wrote:
>
>> On Fri, Oct 18, 2013 at 02:04:41PM -0400, Jonathan M Davis wrote:
>>> On Friday, October 18, 2013 10:38:12 H. S. Teoh wrote:
>> [...]
>>> > IMO, distinguishing between null and empty arrays is bad
>>> > abstraction. I agree with D's "conflation" of null with empty,
>>> > actually. Conceptually speaking, an array is a sequence of values of
>>> > non-negative length. An array with non-zero length contains at least
>>> > one element, and is therefore non-empty, whereas an array with zero
>>> > length is empty. Same thing goes with a slice. A slice is a view
>>> > into zero or more array elements. A slice with zero length is empty,
>>> > and a slice with non-zero length contains at least one element.
>>> > There's nowhere in this conceptual scheme for such a thing as a
>>> > "null array" that's distinct from an empty array. This distinction
>>> > only crops up in implementation, and IMO leads to code smells
>>> > because code should be operating based on the conceptual behaviour
>>> > of arrays rather than on the implementation details.
>>>
>>> In most languages, an array is a reference type, so there's the
>>> question of whether it's even _there_. There's a clear distinction
>>> between having null reference to an array and having a reference to an
>>> empty array. This is particularly clear in C++ where an array is just
>>> a pointer, but it's try in plenty of other languages that don't treat
>>> as arrays as pointers (e.g. Java).
>>
>> To me, these are just implementation details. Conceptually speaking, D
>> arrays are actually slices, so that gives them reference semantics.
>> Being slices, they refer to zero or more elements, so either their
>> length is zero, or not. There is no concept of nullity here. That only
>> comes because we chose to implement slices as pointer + length, so
>> implementation-wise we can distinguish between a null .ptr and a
>> non-null .ptr. But from the conceptual POV, if we consider slices as a
>> whole, they are just a sequence of zero or more elements. Null has no
>> meaning here.
>>
>> Put another way, slices themselves are value types, but they refer to
>> their elements by reference. It's a subtle but important difference.
>>
>>
>>> The problem is that D put the length on the stack alongside the
>>> pointer, making it so that D arrays are sort of reference types and
>>> sort of not. The pointer is a reference type, but the length is a
>>> value type, making the dynamic array half and half. If it were fully a
>>> reference type, then there would be no problem with distinguishing
>>> between null and empty arrays. A null array is simply a null reference
>>> to an array. But since D arrays aren't quite reference types, that
>>> doesn't work.
>> [...]
>>
>> I think the issue comes from the preconceived notion acquired from other
>> languages that arrays are some kind of object floating somewhere out
>> there on the heap, for which we have a handle here. Thus we have the
>> notion of null, being the case when we have a handle here but there's
>> actually nothing out there.
>>
>> But we consider the slice as being a thing right *here* and now,
>> referencing some sequence of elements out there, then we arrive at D's
>> notion of null and empty being the same thing, because while there may
>> be no elements out there being referenced, the handle (i.e. slice) is
>> always *here*. In that sense, there's no distinction between an empty
>> slice and a null slice: either there are elements out there that we're
>> referring to, or there are none. There is no third "null" case.
>>
>> There's no reason why we should adopt the previous notion if this one
>> works just as well, if not better. I argue that the second notion is
>> conceptually cleaner, because it eliminates an unnecessary distinction
>> between an empty sequence and a non-existent sequence (which then leads
>> to similar issues one encounters with null pointers).
>
> If what you say is true then slices would and could never be null...

Aargh, my apologies I misread your post.  Ignore my first reply.

I agree that slices never being null are like a pre-null checked array,  
which is a good thing.  The issue I have had in the past is with strings  
(not slices) mutating from null to empty and/or vice-versa.

Also, it's not at all clear when you're dealing with a pre-check not-null  
slice and when you're dealing with a possibly null array, for example..

import std.stdio;

void foo(string arr)
{
	if (arr is null) writefln("null");
	else writefln("not null");
	if (arr.length == 0) writefln("empty");
	else writefln("not empty");
}

void main()
{
	string arr;
	foo(arr);
	foo(arr[0..$]);
	arr = "";
	foo(arr);
	foo(arr[0..$]);
}

Output:
null
empty
null
empty
not null
empty
not null
empty

Which of those are strings/arrays and which are slices?  Why are the ones  
formed by actually slicing coming up as "is null"?

(This last, not directed at you, just venting..)

I can understand arguing against null from a safety point of view.

I can understand arguing against designs that use null, for the same  
reasons.

I disagree, but then I have comfortably used null for a long time so the  
cost/benefit of using null is heavily on the benefit side for me.  I can  
understand for others this may not be the case.

But, I cannot understand someone who says they have no use for the concept  
of non-existence, or that no code will ever want to make the distinction,  
that is just plainly incorrect .. implementing a singleton pattern  
(probably a bad example :p) relies on being able to check for  
non-existence, using null as the indicator, we do it all the time.

Regan

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/