Just a friendly reminder about using arrays in boolean conditions

Mon Nov 25 09:38:10 UTC 2024

On Monday, November 25, 2024 1:50:07 AM MST Walter Bright via Digitalmars-d 
wrote:
> C has an equivalent behavior distinguishing between a null pointer and a 0
> length string:
>
> ```
> char *s;  // string
> if (s)    // pointer
> if (*s)   // length
> ```
>
> ```
> char[] a;     // array
> if (a)        // pointer
> if (a.length) // length
> ```

Given that C arrays are pointers, there's definitely reason to care about
whether they're null or not, but D arrays are not pointers. D arrays may
have originally come from C arrays, but ultimately, they're fundamentally
different from one another.

D arrays are designed in such a way that there is really no reason to care
one whit whether they're null or not. They're not pointers, and you don't
normally access their ptr member (and when you do, it's @system). When you
access the individual elements, you get a RangeError if you attempt to
access an element outside of the array, and if you want to know whether an
index is within the array, you check its length. If its length isn't 0, then
its ptr isn't null, and you don't have any reason to care about null. If its
length is 0, then whether its ptr is null is also irrelevant, because you're
not going to access non-existent elements.

The result of this is that there's really no reason to care about whether a
D array is null, and code that cares is almost certainly buggy. And what
compounds that is that precisely because D code in general does not care
about null, it's not hard to end up in a situation where you get a null
array when you might have expected an empty non-null array - or in some
cases, you might end up with a non-null empty array when you might have
expected a null one (though the former is more common from what I've seen).
For instance, "".idup will give you null, not a non-null empty string, which
makes perfect sense from an efficiency perspective given that almost no D
code cares about the difference between null and empty. But it's precisely
because almost nothing cares about the difference that it becomes very
error-prone to treat null as special even if you want to.

For instance, a function could try to return null to indicate that it
doesn't have a result and a non-null empty array to indicate that it has a
result but that that result is empty (and of course a non-empty array when
it has a result that isn't empty). However, while the null return might be
clear and explicit and typically be checked immediately on return, it's
really easy to get into a situation where you accidentally have a null array
when you meant to have a non-null empty array, meaning that such code has a
real risk of returning null when it wasn't intended - which is why such
functions really should be returning something like a std.typecons.Nullable
wrapping an array instead of trying to treat null arrays as special.
Treating null arrays as special in D code is just begging for bugs.

As such, I would generally consider it a code smell to see an array in D
checked for null instead of empty. It might make sense in some situations
when dealing with extern(C) code, but even then, usually you're either
passing a length along with it (in which case, a 0 length array shouldn't be
dereferenced by C code either), or you're dealing with a string and need to
pass a null-terminated string which typically means allocating a string
anyway rather than returning the ptr of a D string that might be null. But
in the vast majority of D code, checking an array for null almost certainly
means that the code is doing something wrong. Checking pointers for null
makes sense, because you don't want to dereference a null pointer, but D
arrays are not pointers. They contain pointers and will potentially
dereference them if their length isn't 0, but they themselves are not
pointers and aren't going to be dereferenced if their ptr field is null,
because then their length is 0, and it would result in a RangeError.

And to make matters worse, it seems that because of the fact that there's
really no reason to care about null with arrays, it's often the case that
when someone does it implicitly with an if condition, they think that
they're testing for non-empty when they're actually testing for non-null.
So, while it's already a code smell to see `if(arr !is null)`, from what
I've seen, the odds are extremely high that `if(arr)` is just wrong, because
it's not doing what the programmer intended.

There's just no good reason to do it, because it's routinely misunderstood -
and that's on top of the fact that `if(arr !is null)` is almost certainly
wrong behavior anyway, because outside of very rare cases, D code should not
care whether an array is null or empty, because there is no need to maintain
that distinction normally, and even trying to maintain that distiction in a
section of code is likely to have problems at some point - if nothing else
because none of the code interacting with it will make that distinction.

- Jonathan M Davis