is ==

Mon May 21 19:59:41 UTC 2018

On 5/21/18 3:20 PM, Jonathan M Davis wrote:
> On Monday, May 21, 2018 14:40:24 Steven Schveighoffer via Digitalmars-d-
> learn wrote:
>> For me, the code smell is using arr is null (is it really necessary to
>> check for a null pointer here?), for which I always have to look at more
>> context to see if it's *really* right.
> 
> Really? I would never expect anyone to use is unless they really cared about
> whether array was null. I'd be concerned about whether the code in general
> was right, because treating null as special gets tricky, but that particular
> line wouldn't concern me.

Don't get me wrong, they probably *do* mean to check if it's null. But 
do they *need* to check?

I'll borrow from your example below:

if(arr != null && arr == arr2)

Replace this with:

if(arr !is null && arr == arr2)

Does this look any better? I don't think so, it means the person didn't 
understand what an array actually is. Even though the second actually 
has some semantic meaning (the first is a no-op), it's likely not what 
the author intended.

In most cases, when they check for null, they just want to check to see 
if the array is unset. The semantic meaning of this is usually that it's 
empty, they don't really care if it's actually null or not. In which 
case, checking for exact nullness is actually more expensive, and prone 
to problems.

This comes from many languages where an array is an object type that 
defaults to null, and we reinforce that misconception by allowing null 
as a valid array literal.

>> Even people who write == null may want to check for null thinking that
>> it's how you check an array is empty, not realizing that it *doesn't*
>> check for a null pointer, *AND* it still does exactly what they need it
>> to do ;)
> 
> You honestly expect someone first coming to D expect to check whether an
> array is empty by checking null? That's a bizarre quirk of D that I have
> never seen anyhwere else. I would never expect anyone to purposefully use
> == null to check for empty unless they were very familiar with D, and even
> then, I'd normally expect them to ask what they really mean, which is
> whether the array is empty.

Reread what I said again. They *think* they need to check if it's null 
(it being the mythical Array object type that the language no doubt 
lowers to, just like it does in Java or C# or Swift or...), but really, 
they only need to check if it's empty. Which happens to be all they 
really need.

For instance:

int[] arr;

if (cond)
{
    ...
    arr = new int[5];
    ...
}

if (arr == null)

Now, you can certainly replace arr == null with arr is null, and the 
code works fine -- identically, even though it's more expensive. But to 
me, the arr is null is a red flag. Does the person know that they are 
checking for the ACTUAL value null? You still have to read the code to 
figure it out! I would say most times it's a bug waiting to happen. I 
can't imagine you just see "arr is null" and move on believing the 
author knew what they were doing.

>>> It's the same reason that
>>>
>>> if(arr)
>>>
>>> was temporarily out of the language.
>>
>> It's similar, but I consider it a different reason. While the intent of
>> == null may not be crystal clear, 99% of people don't care about the
>> pointer, they just care whether it's empty. So the default case is
>> usually good enough, even if you don't know the true details.
> 
> I think that that's the key point of disagreement here. I would never
> consider the intent of == null to be crystal clear based solely on the code,
> because it is so common outside of D to use == null to actually check for
> null, and there are better ways in D to check for empty if that's what you
> really mean. My immediate expectation on seeing arr == null is that the
> programmer does not properly understand arrays in D. If I knew that someone
> like you wrote the code, I'd probably decide that you knew what you were
> doing and didn't make a mistake, but I'm not going to assume that in
> general, and honestly, I would consider it bad coding practice (though we
> obviously disagree on that point).

The fundamental reason why == null is generally OK is because generally 
the person doesn't distinguish between nullness and non-null but empty. 
Believe it or not, this is my position as well. Either works fine for 
their code, and in fact, when you analyze the code, checking for 
emptiness is really what they mean.

Consider that new T[0] returns a null array. What happens if it returned 
a non-null array? Only code that uses "is null" would break. Code that 
uses == null would work fine.

> I would consider the if(arr) and arr == null cases to be exactly the same.
> They both are red flags that the person in question does not understand how
> arrays in D work. Yes, someone who knows what they're doing may get it
> right, but I'd consider both to be code smells and I wouldn't purposefully
> do either in my own code. If I found either in my own code, I would expect
> that I'd just found a careless bug.

== null is way more forgiving than if(arr). That is the point I'm 
making. Both can be used incorrectly, only one is going to have big 
problems with implementation details.

>>
>> If we never had null be the default value for an array, and used []
>> instead, I would be actually OK with that. I also feel one of the
>> confusing things for people coming to the language is that arrays are
>> NOT exactly reference types, even though null can be used as a value for
>> assignment or comparison.
>>
>> But it still wouldn't change what most people write or mean, they just
>> would write == [] instead of == null. I don't see how this would solve
>> any of your concerns.
> 
> It would solve the concern, because no one is going to write arr == [] to
> check for null. They'de write it just like they'd write arr == "".

I disagree. I think that's exactly what they would write, either that or 
arr == arr.init.

I'd posit that most people who write == null are checking to see if an 
array has been initialized or not. Initialized meaning "I assigned some 
length of elements to it". This works whether the pointer is null or not.

> They're
> clearly checking for empty, not null. The whole problem here is that pretty
> much everywhere other than D arrays, null and empty are two separate things,
> and pretty much anyone coming from another language will expect them to be
> different. It wouldn't surprise me at all to see a newbie D programmer doing
> something like
> 
> if(arr != null && arr == arr2)
> {...}

This is actually WAY easier to understand than just arr != null by 
itself. Clearly the user thinks arr is an object as I discussed above. 
It's not concerning at all, you just say "you don't need to check for 
null, it's not really an object".

> And out of those who do understand how D dynamic arrays work, a number of
> them continue to distinguish between null and empty arrays in their code -
> e.g. folks like Andrei and Vladimir who write code that uses
> 
> if(arr)
> 
> and means it the way the language means it. The core problem is that D
> treats null arrays as empty. If it would either treat them as actually null
> (with all of the segfaults that go with that) or not treat null as a dynamic
> array, then that whole problem goes away. So, if null were not a dynamic
> array in any shape or form, and you had to use [] to indicate an empty
> array, then that would solve my main concerns with null and dynamic arrays.

Code that uses if(arr) is prone to issues, because something that 
returns an empty array that has a null pointer is really an 
implementation detail. I wouldn't consider any code that depends on the 
implementation detail to be robust.

If we couldn't use null, and used [], then it doesn't *look* as 
incorrect, and probably this would solve some of the confusion. But 
really, it's no different, and I'm sure you'd still see code like:

if(arr != [] && arr == arr2)

-Steve