Empty VS null array?

Mon Oct 28 04:49:52 PDT 2013

I find that have repeated myself a lot in each section/reply below, I am  
not sure whether you'd prefer I just reply with those points once, or  
inline, I chose inline so as it make it clear I was not ignoring your  
points, and to make it clear which of my arguments apply to which point...

:)

On Fri, 25 Oct 2013 12:41:36 +0100, Kagamin <spam at here.lot> wrote:
> On Monday, 21 October 2013 at 10:33:01 UTC, Regan Heath wrote:
>> null strings are no different to null class references, they're not a  
>> special case.
>
> True. That's an implementation detail which has no meaning for business  
> logic.

This argument applies both ways.  If D conflates null and empty, then this  
restricts business logic with an implementation detail.  We agree that D  
has no place in defining business logic, therefore it follows that the  
more flexible option is preferable as it is neutral in its effect on  
business logic.

However, this decision, like most is a cost/benefit analysis and in the  
case of strings the case can be made that they should be a value type, and  
never null.  I can get behind such a decision, as it would mean D was  
taking a side, finally.  If strings cannot be null then we actually  
benefit from the current conflation of the two, by avoiding having to do  
null reference checking, and the associated exception/crash.  I would  
prefer to go the other way and allow a consistent null/empty distinction  
but either option is better than the status quo where we have to check for  
null ("cost") but gain no benefit from this, because we cannot use the  
null state consistently.

> When implementation deviates from business logic, one ends up fixing the  
> implementation details everywhere in order to implement business logic.  
> That's why string.IsNullOrEmpty is used.

I almost never need to use string.IsNullOrEmpty.  The reason why is  
simple.  An empty string is just one value a string may hold, and my code  
does not "generally" treat it as special except in certain specific cases  
where I make that additional check (your blank username example, for  
one).  Null is the only "special" state a string reference can have, so I  
check for this and this alone.

>> People seem to have this odd idea that null is somehow an invalid state  
>> for a string /reference/ (c# strings are reference types), it's not.
>
> That's the very problem: null and empty are valid states and must be  
> treated equally as "no data", but they can't for purely technical  
> reasons.

I never treat null and empty "equally as "no data"" that is my whole  
point.  They are not the same thing conceptually, you should never treat  
them as the same thing.  null means "no data", empty is just one possible  
state of "data".

You might make the business logic decision of disallowing empty values, of  
treating an empty value as if no value was given.  The two would still be  
conceptually separate, but your code would be making the decision to treat  
them in the same way.  You encode this decision in the function which  
accesses the input, once, and your problems are all solved.

If you make the mistake of conflating null and empty in your input layer  
then you restrict your "business logic" and create the very problem you're  
complaining about here, stop conflating them and the problem simply  
vanishes.

If your input mechanism or a 3rd party library is conflating them, then  
you can add a business/conversion layer to convert empty to null and all  
your code can ignore the empty case and simply concentrate on checking for  
null, as it should already do - because this is unavoidable in any case.

This is KISS, collapse the 2 possible "error" states into 1 and check for  
that.

>> People also seem to elevate empty strings to some sort of special  
>> status, that's like saying 0 has some special status for int - it  
>> doesn't it's just one of a number of possible values.
>>
>> In fact, int having no null like state is a "problem" causing solutions  
>> like boxing to elevate the value type to a reference in order to allow  
>> a null state for int.
>
> You want to check ints for null everywhere too?

No. (Strawman).  There are some cases where people wrap int in nullable  
however as there are some use cases where you do want to be able to  
indicate "no data" using a single variable.  This is the flexibility of a  
reference type, and the cost is the check for null.  If you do  
cost/benefit analysis for int with this in mind it is clearly not a type  
we want as a reference type - the performance penalty alone kills this.

>> Yet, in D we've decided to inconsistently remove that functionality  
>> from string for no gain.  If string could not actually be null then  
>> we'd gain something from the limitation, instead we lose functionality  
>> and gain nothing - you still have to check your strings for null in D.
>
> Huh? Null slices work just like empty ones - that's why this topic was  
> started in the first place. One doesn't have to check slices for nulls,  
> only for length.

Slices are not strings, as slices cannot be null.  However "if (slice is  
null)" can still be true - this is just plain wrong/inconsistent.  Lets  
pick a side and handle it consistently, above all else.  We can argue  
about which side, but can we at least agree the inconsistency is a bad  
thing?

> If you want clear nullable semantics, you have Nullable, it works for  
> everything, including strings and ints. You would want this feature only  
> in rare cases, so it doesn't make sense to make it default, or it will  
> be a nuisance.

Strings can be null, not checking for null is fatal.  You cannot easily  
tell if you have a string or a slice so you currently have to check for  
null in most/all cases already.  We're paying that "cost" already and yet  
not getting the full benefit from it.  It's simply a bad investment.  D  
should pick a side and conform to it, either we have nullable strings or  
we don't.  The current middle ground is just worse.

>>> both of them are just "no data", so you end up typing  
>>> if(string.IsNullOrEmpty(mystr)) every time everywhere.
>>
>> I only have to code like this when I use 3rd party code which has  
>> conflated empty and null.  In my code when it's null it means not  
>> specified, and empty is just one type of value - for which I do no  
>> special handling.
>
> Equivalence between null and empty is a business logic's requirement,  
> that's why it's done.

Whose business logic?  This is perhaps my secondary point here.  D has no  
grounds to define business logic for all possible applications, this is  
something each application must have the flexibility to define for  
itself.  A library ought to provide the tools to do it - converting "" to  
null for you - but the language should not mandate it.

>>> And, yeah, only one small feature in this big mess ever needs to  
>>> differentiate between null and empty.
>>
>> Untrue, null allows many alternate and IMO more direct/obvious designs.
>
> The need for those designs is rare and trivially implementable for all  
> value types.

Rare; untrue, I use null all the time to good effect.  Trivially  
implementable, debatable - if you have to do more work you're paying a  
price, if you get no reward for that price then you're wasting resources.   
The current situation in D has you paying the price for no reward.

>>> I found this one case trivially implementable, but nulls still plague  
>>> all remaining code.
>>
>> Which one case?  The readline() one below?
>
> No, it was an authentication system in third-party code for one special  
> case.

No-one is trying to say you cannot code around it, even trivially in some  
cases, but the null design would likely have been simpler still.  And,  
this means less wasted effort, and worse still it gained you nothing.

> I also had to specify this null value in app.config - guess how,  
> explicitly specify, not substitute missing parameter with a default.

Seems to me that if you want a config to be null, you simply omit it from  
the configuration file.  Then have the code return null for it's value, to  
indicate "no data".  If it's present, and set to "" then you would be able  
to differentiate these two cases, which is essential if your business  
logic requires that "" is a valid value for the config.  D should not  
place restrictions on you business logic - with an implementation detail.

> Another possibility for readline is to return a tuple
> {bool eof, string line(non-null)} - this way you have easy check for eof  
> and don't have to check for null when you don't need it.

Yet another more complex design, for no gain.  The additional boolean buys  
us nothing over the string reference, it costs more in terms of memory and  
complexity and you still have to remember to check it, as you have to  
remember to check for null in the original design.

>>> you're screwed or your code becomes littered with null checks, but who  
>>> accounts for all alternative scenarios from the start?
>>
>> Me, and IMO any competent programmer.  It is misguided to think you can  
>> ignore valid states, null is a valid state in C, C++, C#, and D.. You  
>> should be thinking about and handling it.
>
> Here null is a valid state for readline, not for the caller: if the  
> caller parses a multiline data format, unexpected end of file is an  
> invalid state.

If they pass a multi-line data format, and they have counted the number of  
lines prior to passing it (to verify that they can call readline() N times  
safely) then yes, calling readline and getting EOF would be unexpected and  
worthy of an exception.

But, why would you want to pay the cost of processing the lines twice (to  
count them and ensure no EOF)?  Why not just have readline do that for  
you, by returning null on EOF.  Simpler, more direct.

> And what do you gain by littering your code with those null checks? Just  
> making runtime happy and adding noise to the code? You could use that  
> time to improve the code or add features or even relax. It's exactly  
> nullable strings, which gain you only a time waste.

I D, you already have to "litter your code with null checks" so you're  
already paying the cost, you're just not getting any benefit.

>> You don't have to check for it on every access to the variable, but you  
>> do need to check for it once where the variable is assigned, or passed  
>> (in private functions you can skip this).  From that point onward you  
>> can assume non-null, valid, job done.
>
> You just said "never assume". The assumption may fail, because the  
> string type is still nullable, compiler doesn't save you here, this  
> sucks. And in order to check for everything everywhere on a level near  
> that of the compiler, you must be not just competent, but perfect.

Play on words.  If you've filtered out null, you're not "assuming" you're  
"ensuring" it's non-null.  The only way to get null from that point is  
either "by design" or via memory corruption.  D does protect you from  
memory corruption by avoiding the need for raw pointers etc.  And, if  
you're setting string variables to null "by design" then you will need to  
check them again, of course.

Yes, if you want to write good code you need to develop good habits WRT  
using null, it's unavoidable.  Unless we remove null and the  
power/flexibility it affords - which is a valid option.  So, can we just  
pick an option for D and go with it, I don't really mind which way we go -  
tho my preference should be obvious :)

>>> I believe there's no problem domain, which would like to differentiate  
>>> between null and empty string instead of treating them as "no data".
>>
>> null means not specified, non existent, was not there.
>> empty means, present but set to empty/blank.
>>
>> Databases have this distinction for a reason.
>
> Oracle makes no distinction between null and empty string. For a reason?

Looks like it was (ultimately) a mistake:
http://docs.oracle.com/cd/B19306_01/server.102/b14200/sql_elements005.htm

<quote>Note:
Oracle Database currently treats a character value with a length of zero  
as null. However, this may not continue to be true in future releases, and  
Oracle recommends that you do not treat empty strings the same as  
nulls.</quote>

To repeat the important part.. "Oracle recommends that you do not treat  
empty strings the same as nulls".

For. A. Reason.  The database has no right to define business logic - this  
restriction in oracle database has no doubt caused people to have to work  
around it, by using a specific "value" as null.

> A database is an implementation detail of a data storage, it doesn't  
> implement business logic

Agree 100% conflating null and empty string is a business logic decision,  
it has no place in a database or other base level - like a language or  
standard library.

>> If you get input from a user a field called "foo" may be:
>>  - not specified
>>  - specified
>>
>> and if specified, may be:
>>  - empty
>>  - not empty
>
> If the user doesn't fill a text box, it's both empty and not specified -  
> there's just no difference.

There is a clear and important difference.  Lets say the text box  
represents the users middle name, lets presume they have given a value for  
it at some stage, lets assume they would like to remove it.  They load the  
page, and erase the value and click submit.  Your business logic will  
ignore the empty value, and not update the users middle name.  My business  
logic will detect the text box was present (not null) and apply the given  
value "" to the users middle name (in the database for example).

> And it doesn't matter how you store it in the database - as null or as  
> empty string - both are presented in the same way.

They don't have to be, that is my point.  The decision of how to display  
them is a business logic decision and having a clear distinction between  
null and empty allows you to display them differently.  Not having the  
distinction, ties your hands.

> Heck, we use these optional text boxes everywhere - can you tell if  
> their content is empty or not specified?

http is one such input mechanism which conflates null and empty, there are  
numerous ways to code around it.  D is making the same mistake, with the  
same consequences, this is my central point.

> And what if the value is required? Would you accept an empty value?

This is a business logic decision, which D, and the database have no right  
to make.  Yes, if the user could input an empty value and yes if my  
business logic wanted to detect and disallow it - I would.  If not, I  
would not.  The point is that null gives you the power to express both,  
rather than restricting you and forcing an indirect solution to code  
around the lack.

>> If we have null, lets use it, if we want to remove null the lets remove  
>> it, but can we get out of this horrid middle ground please.
>
> *sigh* people just don't buy the KISS principle...

No kidding.  From my perspective null /is/ KISS and having to code around  
the lack with a more complex design is not.  :P

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/