null Vs [] return arrays

Fri Apr 1 08:52:47 PDT 2011

On Fri, 01 Apr 2011 13:38:45 +0100, Steven Schveighoffer  
<schveiguy at yahoo.com> wrote:

> On Fri, 01 Apr 2011 06:38:56 -0400, Regan Heath <regan at netmail.co.nz>  
> wrote:
>
>> On Mon, 28 Mar 2011 17:54:29 +0100, bearophile  
>> <bearophileHUGS at lycps.com> wrote:
>>> Steven Schveighoffer:
>>>
>>>> So essentially, you are getting the same thing, but using [] is  
>>>> slower.
>>>
>>> It seems I was right then, thank you and Kagamin for the answers.
>>
>> This may be slightly OT but I just wanted to raise the point that  
>> conceptually it's nice to be able to express (exists but is empty) and  
>> (does not exist).  Pointers/references have null as a (does not exist)  
>> "value" and this is incredibly useful.  Try doing the same thing with  
>> 'int' .. it requires you either use int* or pass an additional boolean  
>> to indicate existence.. yuck.
>>
>> I'd suggest if someone types '[]' they mean (exists but is empty) and  
>> if they type 'null' they mean (does not exist) and they may be relying  
>> on the .ptr value to differentiate these cases, which is useful.  If  
>> you're not interested in the difference, and you need performance, you  
>> simply use 'null'.  Everybody is happy. :)
>
> The distinction is useful if you have something to reference (e.g. an  
> empty slice that points at the end of a pre-existing non-empty array).   
> But [] is a new array, no point in allocating memory just so the pointer  
> can be non-null.  Can you come up with a use case to show why you'd want  
> such a thing?

Ok.  Recently I wrote (in C) a function proxy interface.  I had to execute  
a set of functions from one thread, and wanted to 'call' them from  
potentially many.  So, I set up the thread, added events, and a queue, etc  
and I wrote a proxy function for 'calling' them from the many threads  
which looks like...

void proxy(int func, ...) {}

So, it accepts a variable list of args, places them in a structure, places  
that in the queue, and waits on an event for the proxy thread to execute  
the command and return the result.  Lets say the function I am executing  
is a database lookup, lets say I have a database field which is a string,  
lets say it can be NULL (database definition allows NULLS).  Now, lets say  
I want to do these lookups:
1. lookup all objects where the field is NULL
2. lookup all objects where the field is "reganwashere"
3. lookup all objects where the field is "" (empty/non-null)

#1 and #2 are simple enough.  I call proxy like..
   proxy(LOOKUP, NULL);
   proxy(LOOKUP, "reganwashere");

and in the actual lookup function, invoked by proxy, I call:
   pFieldValue = va_arg(pArgs, char*);

and I get NULL, and "reganwashere".

In C, case #2 would also be easy, I would call proxy(LOOKUP, "") and in  
the actual lookup function pFieldValue would be "" (not NULL).

But, in D it seems I cannot do this.  In D I would have to pass an  
additional boolean parameter, or add another level of indirection i.e.  
pass a string[]*.  The same problem exists in C if I want to pass an 'int'  
or any primitive type, I have to pass it as int*, use a boolean, or invent  
a 'special' value which means essentially NULL/not-set/ignored.

There are plenty of other use cases, essentially anywhere where you have  
something that can exist in one of 3 states:
   1. NULL       (not set)
   2. ""         (set, to blank)
   3. "anything" (set, to anything)

Like.. parsing input from a web page, where a field can:
   1. not be present on the page      (NULL)
   2. be present, but left blank      ("")
   3. be present, contains "anything" ("anything")

This one came up a lot when I worked with web software, we had to be able  
to detect whether the user was trying to set something to a blank string,  
and in some cases we wanted that to remove the setting entirely (null & ""  
being identical ok) or actually set it to a blank string (null & "" being  
identical, not ok).

Or... saving settings to a file from user input, where the user selects a  
setting from a menu, then enters the value and could:
   1. not select setting A, therefore save no value    (NULL)
   2. select the setting A, enter blank string         ("")
   3. select the setting A, enter the value "anything" ("anything")

Granted (and this was the response 2 years back when this topic came up) I  
can "work around" the deficiency by using a map/hash/dictionary where I  
insert key/value pairs, then I can ask it if the key exists.  But, this is  
essentially another level of indirection like an int* or string[]* and is  
more heavy weight than I might want/need.

Ultimately, and people may disagree here, I don't have a problem with  
pointers, and this is a really 'nice' feature of using pointers, and it  
seems D's arrays don't share it, which bothers me.

> Your plan would mean that [] is a memory allocation.  I'd rather not  
> have the runtime do the lower performing thing unless there is a good  
> reason.

I'm not too bothered what syntax gets used, provided it was something that  
you don't accidently use when you do not want it, and wasn't too horrible  
to use as I don't see this as being a very uncommon occurance (which would  
warrant/allow ugliness of syntax).  "[]" seems logical, as does "new T[]",  
both are not "null" so the programmer was obviously trying to do something  
other than pass null.

> As an alternative, you could use (cast(T *)null)[1..1] if you really  
> needed it (this also would be higher performing, BTW since the runtime  
> array literal function would not be called).

That seems to work, but it's hideous syntax for something that is not that  
uncommon IMO.

To remind myself what D does, and try and find another way to achive the  
same thing I wrote a test case:
--------------------
import std.stdio;

char[] foo(int state)
{
	switch(state)
	{
	default:
	case 0:
		return null;	
	case 1:
		return [];	
	case 2:
		return new char[0];
	case 3:
		return (cast(char *)null)[1..1];
	case 4:
		return cast(char[])"".dup;
	case 5:
		return cast(char[])""[0..0];
	}
}
int main(string[] args)
{
	foreach(int i; 0..6)
	{
		char[] arr = foo(i);
		writefln("foo%d 0x%08x,%d",
			i,
			arr.ptr,
			arr.length);
	}
	return 0;
}

Which outputs:

foo0 0x00000000,0
foo1 0x00000000,0
foo2 0x00000000,0
foo3 0x00000001,0  <- your suggestion
foo4 0x00000000,0
foo5 0x00000000,0

So, your suggestion appear to be the only way to get an empty array in D.

R