Empty VS null array?

Fri Oct 18 12:58:07 PDT 2013

On Fri, Oct 18, 2013 at 02:04:41PM -0400, Jonathan M Davis wrote:
> On Friday, October 18, 2013 10:38:12 H. S. Teoh wrote:
[...]
> > IMO, distinguishing between null and empty arrays is bad
> > abstraction. I agree with D's "conflation" of null with empty,
> > actually. Conceptually speaking, an array is a sequence of values of
> > non-negative length. An array with non-zero length contains at least
> > one element, and is therefore non-empty, whereas an array with zero
> > length is empty. Same thing goes with a slice. A slice is a view
> > into zero or more array elements. A slice with zero length is empty,
> > and a slice with non-zero length contains at least one element.
> > There's nowhere in this conceptual scheme for such a thing as a
> > "null array" that's distinct from an empty array. This distinction
> > only crops up in implementation, and IMO leads to code smells
> > because code should be operating based on the conceptual behaviour
> > of arrays rather than on the implementation details.
> 
> In most languages, an array is a reference type, so there's the
> question of whether it's even _there_. There's a clear distinction
> between having null reference to an array and having a reference to an
> empty array. This is particularly clear in C++ where an array is just
> a pointer, but it's try in plenty of other languages that don't treat
> as arrays as pointers (e.g. Java).

To me, these are just implementation details. Conceptually speaking, D
arrays are actually slices, so that gives them reference semantics.
Being slices, they refer to zero or more elements, so either their
length is zero, or not. There is no concept of nullity here. That only
comes because we chose to implement slices as pointer + length, so
implementation-wise we can distinguish between a null .ptr and a
non-null .ptr. But from the conceptual POV, if we consider slices as a
whole, they are just a sequence of zero or more elements. Null has no
meaning here.

Put another way, slices themselves are value types, but they refer to
their elements by reference. It's a subtle but important difference.

> The problem is that D put the length on the stack alongside the
> pointer, making it so that D arrays are sort of reference types and
> sort of not. The pointer is a reference type, but the length is a
> value type, making the dynamic array half and half. If it were fully a
> reference type, then there would be no problem with distinguishing
> between null and empty arrays. A null array is simply a null reference
> to an array. But since D arrays aren't quite reference types, that
> doesn't work.
[...]

I think the issue comes from the preconceived notion acquired from other
languages that arrays are some kind of object floating somewhere out
there on the heap, for which we have a handle here. Thus we have the
notion of null, being the case when we have a handle here but there's
actually nothing out there.

But we consider the slice as being a thing right *here* and now,
referencing some sequence of elements out there, then we arrive at D's
notion of null and empty being the same thing, because while there may
be no elements out there being referenced, the handle (i.e. slice) is
always *here*. In that sense, there's no distinction between an empty
slice and a null slice: either there are elements out there that we're
referring to, or there are none. There is no third "null" case.

There's no reason why we should adopt the previous notion if this one
works just as well, if not better. I argue that the second notion is
conceptually cleaner, because it eliminates an unnecessary distinction
between an empty sequence and a non-existent sequence (which then leads
to similar issues one encounters with null pointers).

T

-- 
Answer: Because it breaks the logical sequence of discussion. / Question: Why is top posting bad?