Empty VS null array?

Regan Heath regan at netmail.co.nz
Mon Oct 21 03:33:02 PDT 2013


On Sat, 19 Oct 2013 10:56:02 +0100, Kagamin <spam at here.lot> wrote:

> On Friday, 18 October 2013 at 10:44:11 UTC, Regan Heath wrote:
>> This comes up time and again.  The use of, and ability to distinguish  
>> empty from null is very useful.  Yes, you run the risk of things like  
>> null pointer exceptions etc, but we have that risk now without the  
>> reward of being able to distinguish these cases.
>
> In C# code null strings are a plague.

I code in C# every day for work and I never have any problems with null  
strings.  The conflated empty/null cases are the real nightmare for me  
(more below).

null strings are no different to null class references, they're not a  
special case.  People seem to have this odd idea that null is somehow an  
invalid state for a string /reference/ (c# strings are reference types),  
it's not.

People also seem to elevate empty strings to some sort of special status,  
that's like saying 0 has some special status for int - it doesn't it's  
just one of a number of possible values.

In fact, int having no null like state is a "problem" causing solutions  
like boxing to elevate the value type to a reference in order to allow a  
null state for int.

Yet, in D we've decided to inconsistently remove that functionality from  
string for no gain.  If string could not actually be null then we'd gain  
something from the limitation, instead we lose functionality and gain  
nothing - you still have to check your strings for null in D.

We ought to go one way or the other, this middle ground is worse than  
either of the other options.

In my code I don't have to check for or treat empty strings any  
differently to other values.  I simply have to check for null.   
Remembering to check for null on reference types is automatic for me,  
strings are not special in this regard.

> Most of the time you don't need them

Sure, and if I don't have access to null (like when using a value type  
like int), I can code around that lack, but it's never as straight forward  
a solution.

> but still must check for them just in order to not get an exception.

Sure, you must check for the possible states of a reference type.

> Also business logic makes no difference between null and empty

This is simply not true.  Example at the end.

> both of them are just "no data", so you end up typing  
> if(string.IsNullOrEmpty(mystr)) every time everywhere.

I only have to code like this when I use 3rd party code which has  
conflated empty and null.  In my code when it's null it means not  
specified, and empty is just one type of value - for which I do no special  
handling.

> And, yeah, only one small feature in this big mess ever needs to  
> differentiate between null and empty.

Untrue, null allows many alternate and IMO more direct/obvious designs.

> I found this one case trivially implementable, but nulls still plague  
> all remaining code.

Which one case?  The readline() one below?

>> Take this simple design:
>>
>>   string readline();
>>
>> This function would like to be able to:
>>  - return null for EOF
>>  - return [] for a blank line
>>
>> but it cannot, because as soon as you write:
>>
>>   foo(readline())
>>
>> the null/[] case merges.
>
> This is a horrible design. You better throw an exception on eof instead  
> of null:

No, no, no.  You should only throw in exceptional circumstances or you  
risk using exceptions for flow control, and that is just plain horrid.

> this null will break the caller anyway possibly in a contrived way.

Never a contrived way, always a blatantly obvious one and only if you're  
not doing your job properly.  If you want a contrived, unpredictable and  
difficult to debug breakage look no further than heap or stack  
corruption.  Null is never a difficult bug to find and fix, and is no  
different to forgetting to handle one of the integer return values of a  
function.

I use this all the time:
http://msdn.microsoft.com/en-us/library/system.io.streamreader.readline.aspx

It has never caused me any issues.  It explicitly states that null is a  
possible output, and so I check for it - doing anything less is simply bad  
programming.

> It works if you read one line per loop cycle, but if you read several  
> lines and assume they're not null (some multiline data format),

There is your problem, never "assume" - the documentation is very clear on  
the issue.

> you're screwed or your code becomes littered with null checks, but who  
> accounts for all alternative scenarios from the start?

Me, and IMO any competent programmer.  It is misguided to think you can  
ignore valid states, null is a valid state in C, C++, C#, and D.. You  
should be thinking about and handling it.

You don't have to check for it on every access to the variable, but you do  
need to check for it once where the variable is assigned, or passed (in  
private functions you can skip this).  From that point onward you can  
assume non-null, valid, job done.

>> There are plenty of other such design/cases that can be imagined, and  
>> while you can work around them all they add complexity for zero gain.
>
> I believe there's no problem domain, which would like to differentiate  
> between null and empty string instead of treating them as "no data".

null means not specified, non existent, was not there.
empty means, present but set to empty/blank.

Databases have this distinction for a reason.

If you get input from a user a field called "foo" may be:
  - not specified
  - specified

and if specified, may be:
  - empty
  - not empty

If foo is not specified you may want to assign a default value for it, if  
your business logic is using empty to mean "not specified" you prevent the  
user actually setting foo to empty and that limitation is a right pain in  
many cases.

You can code around this by using a boolean a dictionary to indicate the  
specified/not specified distinction, but this is less direct than simply  
using null.

If we have null, lets use it, if we want to remove null the lets remove  
it, but can we get out of this horrid middle ground please.

Regan

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/


More information about the Digitalmars-d mailing list