Empty VS null array?

Fri Oct 25 04:41:36 PDT 2013

On Monday, 21 October 2013 at 10:33:01 UTC, Regan Heath wrote:
> null strings are no different to null class references, they're 
> not a special case.

True. That's an implementation detail which has no meaning for 
business logic. When implementation deviates from business logic, 
one ends up fixing the implementation details everywhere in order 
to implement business logic. That's why string.IsNullOrEmpty is 
used.

> People seem to have this odd idea that null is somehow an 
> invalid state for a string /reference/ (c# strings are 
> reference types), it's not.

That's the very problem: null and empty are valid states and must 
be treated equally as "no data", but they can't for purely 
technical reasons.

> People also seem to elevate empty strings to some sort of 
> special status, that's like saying 0 has some special status 
> for int - it doesn't it's just one of a number of possible 
> values.
>
> In fact, int having no null like state is a "problem" causing 
> solutions like boxing to elevate the value type to a reference 
> in order to allow a null state for int.

You want to check ints for null everywhere too?

> Yet, in D we've decided to inconsistently remove that 
> functionality from string for no gain.  If string could not 
> actually be null then we'd gain something from the limitation, 
> instead we lose functionality and gain nothing - you still have 
> to check your strings for null in D.

Huh? Null slices work just like empty ones - that's why this 
topic was started in the first place. One doesn't have to check 
slices for nulls, only for length.

If you want clear nullable semantics, you have Nullable, it works 
for everything, including strings and ints. You would want this 
feature only in rare cases, so it doesn't make sense to make it 
default, or it will be a nuisance.

>> both of them are just "no data", so you end up typing 
>> if(string.IsNullOrEmpty(mystr)) every time everywhere.
>
> I only have to code like this when I use 3rd party code which 
> has conflated empty and null.  In my code when it's null it 
> means not specified, and empty is just one type of value - for 
> which I do no special handling.

Equivalence between null and empty is a business logic's 
requirement, that's why it's done.

>> And, yeah, only one small feature in this big mess ever needs 
>> to differentiate between null and empty.
>
> Untrue, null allows many alternate and IMO more direct/obvious 
> designs.

The need for those designs is rare and trivially implementable 
for all value types.

>> I found this one case trivially implementable, but nulls still 
>> plague all remaining code.
>
> Which one case?  The readline() one below?

No, it was an authentication system in third-party code for one 
special case. I also had to specify this null value in app.config 
- guess how, explicitly specify, not substitute missing parameter 
with a default.

Another possibility for readline is to return a tuple
{bool eof, string line(non-null)} - this way you have easy check 
for eof and don't have to check for null when you don't need it.

> I use this all the time:
> http://msdn.microsoft.com/en-us/library/system.io.streamreader.readline.aspx
>
> It has never caused me any issues.  It explicitly states that 
> null is a possible output, and so I check for it - doing 
> anything less is simply bad programming.
>
>> It works if you read one line per loop cycle, but if you read 
>> several lines and assume they're not null (some multiline data 
>> format),
>
> There is your problem, never "assume" - the documentation is 
> very clear on the issue.
>
>> you're screwed or your code becomes littered with null checks, 
>> but who accounts for all alternative scenarios from the start?
>
> Me, and IMO any competent programmer.  It is misguided to think 
> you can ignore valid states, null is a valid state in C, C++, 
> C#, and D.. You should be thinking about and handling it.

Here null is a valid state for readline, not for the caller: if 
the caller parses a multiline data format, unexpected end of file 
is an invalid state.

And what do you gain by littering your code with those null 
checks? Just making runtime happy and adding noise to the code? 
You could use that time to improve the code or add features or 
even relax. It's exactly nullable strings, which gain you only a 
time waste.

> You don't have to check for it on every access to the variable, 
> but you do need to check for it once where the variable is 
> assigned, or passed (in private functions you can skip this).  
> From that point onward you can assume non-null, valid, job done.

You just said "never assume". The assumption may fail, because 
the string type is still nullable, compiler doesn't save you 
here, this sucks. And in order to check for everything everywhere 
on a level near that of the compiler, you must be not just 
competent, but perfect.

>> I believe there's no problem domain, which would like to 
>> differentiate between null and empty string instead of 
>> treating them as "no data".
>
> null means not specified, non existent, was not there.
> empty means, present but set to empty/blank.
>
> Databases have this distinction for a reason.

Oracle makes no distinction between null and empty string. For a 
reason?
A database is an implementation detail of a data storage, it 
doesn't implement business logic, it only provides features, 
which can be used with more or less success to implement business 
logic. Ever heard of advantages of OO databases over relational 
ones? That's an illustration of technical details, which don't 
precisely map to business logic.

> If you get input from a user a field called "foo" may be:
>  - not specified
>  - specified
>
> and if specified, may be:
>  - empty
>  - not empty

If the user doesn't fill a text box, it's both empty and not 
specified - there's just no difference. And it doesn't matter how 
you store it in the database - as null or as empty string - both 
are presented in the same way. Heck, we use these optional text 
boxes everywhere - can you tell if their content is empty or not 
specified?

And what if the value is required? Would you accept an empty 
value? And if your database treats empty string as not null, 
would you allow to register a user with an empty login name? And 
how to express this constraint in the database? In SQL "not null" 
means "required value", but it's not equivalent to the business 
logic'a notion of a required value. I wouldn't be surprised if 
Oracle did that in order to reject empty strings in not null 
fields.

Let's consider a process of specifying user's data. What text 
fields do we have?
1. Login. No difference between null and empty - both invalid - 
"no data", must enter something.
2. First name. No difference between null and empty - both are 
"no data" and are presented as empty text box.
3. Middle name. ditto.
4. Last name. ditto.
5. Country. ditto.
6. State. ditto.
7. City. ditto.
8. Address. ditto.
9. Building. ditto.
10. Flat. ditto.
11. Zip code. ditto.
12. Phone. ditto.
13. Fax. ditto.
14. E-mail. ditto.
15. Site. ditto.
16. Passport number. ditto.
17. Birth place. ditto.
18. Comment. Hell! Comment!
See? Not a single field in the list requires distinction between 
null and empty. And slices don't differentiate between them. Just 
as planned.

> If we have null, lets use it, if we want to remove null the 
> lets remove it, but can we get out of this horrid middle ground 
> please.

*sigh* people just don't buy the KISS principle...