Passing dynamic arrays

Wed Nov 10 07:02:42 PST 2010

On Tue, 09 Nov 2010 15:13:55 -0500, Pillsy <pillsbury at gmail.com> wrote:

> Steven Schveighoffer Wrote:
>
>> On Tue, 09 Nov 2010 08:14:40 -0500, Pillsy <pillsbury at gmail.com> wrote:
> [...]
>> > Ah! This is a lot of what was confusing me about arrays; I still  
>> thought
>> > they had this behavior. The fact that they don't makes me a good deal
>> > more comfortable with them, though I still don't like the
>> > non-deterministic way that they may copy their elements or they may
>> > share structure after you append stuff to them.
>
>> As I said before, this rarely affects code.  The common cases I've seen:
>
>> 1. You append to an array and return it.
>> 2. You modify data in the array.
>> 3. You use a passed in array as a buffer, which means you overwrite the
>> array, and then start appending when it runs out of space.
>
>> I don't ever remember seeing:
>
>> You append to an array, then go back and modify the first few bytes of  
>> the
>> array.
> I've certainly encountered situations in at least one other language  
> where standard library functions will return mutable arrays which may or  
> may not share structure with their inputs. This has been such a frequent  
> source of pain when using that language that I tend to react very  
> negatively to the possibility in any context.

Care to name names?  I want to understand this dislike of D arrays,  
because out of all the languages I've ever used, D arrays are by far the  
easiest and most intuitive to use.  I don't expect to be convinced, but at  
least we can have some debate on this, and maybe we can avoid mistakes  
made by other languages.

>> Let's assume this is a very common thing and absolutely needs to be
>> addressed.  What would you like the behavior to be?
>
> Using a different, library type for a buffer you can append to. I think  
> of "a buffer or abstract list you can cheaply append to" as a different  
> sort of type from a fixed size buffer anyway, since it so often is a  
> different type. Arrays/slices are a very basic type in D, and I'm  
> generally thinking that giving your basic types simpler, easier to  
> understand semantics is worth paying a modest cost.

There was a time when the T[new] idea was expected to be part of the  
language.  Both Andrei and Walter were behind it, and seldom does  
something not make it into the language when that happens.

It turns out, that after all the academic and theoretical discussions were  
finished, and it came time to implement, it was a clunky and confusing  
feature.  Andrei said that for TDPL he had a whole table dedicated to what  
type to use in which cases (T[] or T[new]) and he didn't even know how to  
fill out the table.

The beauty of D's arrays are that the slice and the array are both the  
same type, so you only need to define one function to handle both, and  
appending "just works".  I feel like this is simply a case of 'not well  
enough understood.'

BTW, you can allocate a fixed buffer by doing:

T[BUFSIZE] buffer;

This cannot be appended to.  It is still difficult to allocate one of  
these on the heap, which is a language shortcoming, but it can be fixed.

> [...]
>> IMO, the benefits of just being able to append to an array any time you
>> want without having to set up some special type far outweighs this  
>> little
>> quirk that almost nobody encounters.  You can append to *any* array, no
>> matter where the data is located, or whether the data is a slice, and it
>> just works.  I can't see how anyone would prefer another solution!
>
> There's a difference between appending and appending in place. The  
> problem with not appending in place (and arrays not having the  
> possibility of a reserve that's larger than the actual amount, of  
> course) is one of efficiency. Having
>
> auto s = "foo";
> s ~= "bar";
>
> result in a new array being allocated that is of length 6 and contains  
> "foobar", and assigning that array to `s`, is obviously useful and  
> desirable behavior. If the expansion can happen in place, that's a  
> perfectly reasonable performance optimization to have in the case of  
> strings or other immutable arrays. Indeed, one of the reasons that  
> functional programming and GC go together like peanut butter and jelly  
> is that together they let you get all sorts of wins in terms of  
> efficiency from shared structure.
>
> However, I've found working with languages that mix a lot of imperative  
> and functional constructs (Lisp is one, but not the only one) that if  
> you're going to do this, it's really very important that there not be  
> any doubt about when mutable state is shared and when it isn't. D is  
> trying to be that same kind of multi-paradigm language. This means that,  
> for mutable arrays, having
>
> int[] x = [1, 2, 3];
> x ~= [4, 5, 6];

To leave no doubt about whether this reallocates or not try:

bool willReallocate = x.length + 3 > x.capacity;

But I still don't understand this concept.  If you find out it's not going  
to reallocate, what are you going to do?  I mean, you have three cases  
here:

1. You *don't* want it to reallocate -- well, you can't enforce this, but  
you can use ref to ensure the original is always affected
2. You *want* it to reallocate -- use dup or ~
3. You don't care -- just use the array directly

I don't see how these three options aren't enough.

> maybe reallocate and maybe not seems like it's only really there to  
> protect people from doing inefficient things by accident when they  
> append onto the back of an array repeatedly (or to make that admittedly  
> common case more convenient). This really doesn't strike me as worth the  
> trouble. Like I said elsewhere, the uncertainty gives me the screaming  
> willies.

I hear you, but at the same time, we are talking about common and uncommon  
cases here.  D (at least in my mind) tries to be a practical language --  
make the common things easy as long as they are safe.  And the cases where  
D's arrays may surprise you are pretty uncommon IMO.

-Steve