Empty string vs null

Tue Feb 4 08:16:26 UTC 2020

On Tuesday, February 4, 2020 12:33:42 AM MST mark via Digitalmars-d-learn 
wrote:
> I have just discovered that D seems to treat empty and null
> strings as the same thing:
>
> // test.d
> import std.stdio;
> import std.string;
> void main()
> {
>      string x = null;
>      writeln("x     = \"", x, "\"");
>      writeln("null  = ", x == null);
>      writeln("\"\"    = ", x == "");
>      writeln("empty = ", x.empty);
>      x = "";
>      writeln("\nx     = \"", x, "\"");
>      writeln("null  = ", x == null);
>      writeln("\"\"    = ", x == "");
>      writeln("empty = ", x.empty);
>      x = "x";
>      writeln("\nx     = \"", x, "\"");
>      writeln("null  = ", x == null);
>      writeln("\"\"    = ", x == "");
>      writeln("empty = ", x.empty);
> }
>
> Output:
>
> x     = ""
> null  = true
> ""    = true
> empty = true
>
> x     = ""
> null  = true
> ""    = true
> empty = true
>
> x     = "x"
> null  = false
> ""    = false
> empty = false
>
> 1. Why is this?

It's a side effect of how dynamic arrays in D are structured. They're
basically

struct DynamicArray(T)
{
    size_t length;
    T* ptr;
}

A null array has a length of 0 and ptr which is null. So, if you check
length, you get 0. empty checks whether length is 0. So, if you check
whether an array is empty, and it happens to be null, then the result is
true.

Similarly, the code which checks for equality is going to check for length
first. After all, if the lengths don't match, there's no point in comparing
the elements in the array. And if the length is 0, then even if the lengths
match, there's no point in checking the value of ptr, because the array has
no elements. So, whether the array is empty because it's null or whether
it's because its length got reduced to 0 is irrelevant.

The natural result of all of this is that D treats null arrays and empty
arrays as almost the same thing. They're treating differently if you use the
is operator, because that checks that the two values are the same bitwise.
For instance, in the case of pointers or classe references, it checks their
point values, not what they point to. And in the case of dynamic arrays,
it's comparing both the length and ptr values. So, if you want to check
whether a dynamic array is really null, then you need to use the is operator
instead of ==. e.g.

writeln(arr is null);

instead of

writeln(arr == null);

As a side note, when using an array directly in the condition of an if
statement or assertion, it's equivalent to checking whether it's _not_ null.
So,

if(arr) {...}

is equivalent to

if(arr !is null) {...}

Because of how a null array is an empty array, some people expect the array
to be checked for whether it's non-empty in those situations, which can
cause confusion.

> 2. Should I prefer null or ""? I was hoping to return null to
> indicate "no string that match the criteria", and "some string"
> otherwise.

In most cases, it really doesn't matter in most situations whether you use
null or "" except that "" is automatically a string, whereas null can be
used as a literal for any type of dynamic array (in fact typeof(null) is its
own type in order to deal with that in generic code). The reason that

"" is null

is false is because all string literals in D have a null character one past
their end. This is so that you can pass them directly to C functions without
having to explicitly add the null character. e.g. both

printf("hello world");

and

printf("");

work correctly, because the compiler implicitly uses the ptr member of the
strings, and the C code happily reads past the end of the array to the null
character, whereas ""[0] would throw a RangeError in D code. Strings that
aren't literals don't have the null character unless you explicitly put it
there, and they require that you use ptr explicitly when calling C
functions, but for better or worse, string literals don't force that on you.

There are definitely experienced D programmers who differentiate between
null and empty arrays / strings in their code (in fact, that's why if(arr)
ultimately wasn't deprecated even though a number of people were pushing for
it because of how it confuses many people). However, there are also plenty
of D programmers who would argue that you should never treat null as special
with arrays because of how null arrays are empty instead of being treated as
their own thing.

Personally, I would say that if you want to differentiate between null and
empty, it can be done, but you need to be careful - especially if this is
going to be a function in a public API rather than something local to your
code. It's really easy to end up with a null array when you didn't expect to
- especially if your function is calling other functions that return arrays.

So, if you had a function that returned null when it fails, that _can_ work,
but you would either have to make sure that success never resulted in an
empty array being returned, or you would have to make it clear in the
documentation that the is operator must be used to check for null rather
than == and ensure that even if an empty array is returned, it will never
null. It can work, but ultimately, for public APIs, it's arguably better to
use Nullable from std.typecons to differentiate. It has the downside that
the return type is larger, but it's less error-prone. For code that isn't
part of a public API (especially code that only you work on), it's less
risky to explicitly return null rather than using Nullable, but it's still a
risk - especially if the code gets changed over time.

- Jonathan M Davis