Just a friendly reminder about using arrays in boolean conditions

Wed Nov 27 06:46:18 UTC 2024

On Tuesday, November 26, 2024 10:01:15 PM MST Walter Bright via Digitalmars-d 
wrote:
> On 11/25/2024 1:53 AM, Jonathan M Davis wrote:
> > The core problem is that ptr is checked at all. Whether it's null or not
> > is
> > absolutely irrelevant to almost all D code.
>
> It is relevant in the way I use it, as I will often recycle buffers to avoid
> the free/malloc dance. A non-null pointer tells me it is allocated.

Well, that's kind of a special case, and it relates specifically to memory
management.

D's array operations in general are designed in such a way that they don't
care about the difference between null and empty at all, and if you're just
using the array operations, there's really no reason to care about null, and
it actually becomes error-prone to care.

Where problems tend to crop up is when someone tries to treat null as
indicating that an array has no value, whereas non-null empty is a value.
This is something that works just fine with pointers, because a null pointer
truly has no value, and nothing can be done with it as long as it's null,
but it doesn't work very well with arrays. A null array can do all of the
same operations that a non-null empty array can. The only real difference as
far as the array operations go is that appending to an empty array _might_
cause more memory to be allocated and the ptr field to point to a new
address, whereas appending to a null array _will_ cause memory to be
allocated and the ptr field to point to a new address.

So, code that tries to treat a null array as an array without a value
quickly runs into problems. Most D code (including most of the language)
simply doesn't make that distinction. All it cares about is whether the
array is empty, and a null array has a length of 0, so it's empty. So, you
easily run into situations where code will end up with a null array or a
non-null empty array when you might have expected the other (or might have
had the other prior to some refactoring). And if a piece of code cares about
the difference, it's going to be buggy.

In such cases, it's generally better to use a wrapper such as
std.typecons.Nullable to indicate the lack of a value rather than using null
to indicate that, just like you'd have to do with any non-nullable type.

Now, if you're specifically using null to check whether an array has been
allocated, because you're trying to manage memory in some fashion, then null
tells you exactly what you need to know. That information is inherent to
what null is. So, that's not buggy in the same way. That then of course gets
into all of the typical memory management issues (especially with any code
that uses malloc and free rather than the GC), but as far as the array
operations go, it's a non-issue. They don't care about the difference
between null and empty, and they will quite happily allocate new GC memory
when an operation requires it no matter what kind of memory the array
pointed to prior to that.

Regardless, my point is that because D arrays are designed in such a way
that their semantics don't care about null and generally treat null and
empty as the same, having code which tries to treat null as special is
usually going to result in bugs (in particular when treating it as special
has nothing to do with memory management). Either way, I would consider it
good practice to be explicit about testing for null vs empty instead of
if(arr), because if(arr) is misunderstood so frequently that the odds are
very high that the programmer who wrote the code misunderstood what they
were actually testing. And even if they didn't, you have no way of knowing
that when reading their code. On the other hand, code like if(arr !is null)
or if(!arr.empty) is explicit, so it's clear what was intended.

> > So, fundamentally, the check for null makes no sense even if it would have
> > made sense with a C array, because a C array is just a naked pointer and
> > has no language protections to ensure that you don't dereference it when
> > it doesn't have elements. D arrays have those protections.
>
> I recycle buffers in C code as well!
>
> BTW, I understand that there can be confusion about what it means. In my own
> code I'm careful to use `buf.length`.

Honestly, I think that the confusion is great enough that using arrays
directly in conditions should just be deprecated, but even if it isn't, I
started the thread as a reminder about the behavior of if(arr) in the hopes
that more people would be aware of the issue and therefore hopefully write
fewer bugs. My experience has been that in almost all cases, if(arr) is a
bug.

Personally, about the only time that I use non-boolean values in a condition
is with pointers when declaring a variable, e.g.

    if(auto value = key in aa)

but there are other people who do it semi-frequently, and in the case of
arrays, it's definitely frequently misunderstood.

- Jonathan M Davis