Arrays passed by almost reference?

Thu Nov 12 10:58:56 PST 2009

gzp Wrote:

> I think problem is that, dynamic arrays and slices are NOT the same. 

I agree with most of what you wrote, but I can't see that in the current implementation.

> They have a common subset of interfaces (length, at, slice(maybe)), but 
> they are just different. An array owns it's element, it can 
> resize/remove, etc. the underlying structure. Ex A special array that 
> stores the elements in a tree can add remove nodes at any time, but a 
> slice of this "array" cannot alter the tree - only the elements (For 
> example a AVL-tree cannot have a slice that modifies the element as it 
> would require a restructuring of the tree, element modification can be 
> performed only through the array itself)
> I know int[] is a much simpler storage, but it makes my point more 
> understandable.
> 
> So now back to int[]
> According to the current implementation
> 
> foo(ref int[]) is an array: It can add/remove elements (restructure the 
> tree)

I don't think so: that is a reference to a slice.

> foo(int[]) is a mixed thing. It is a slice with an automatic copy 
> feature. It is not an array nor a slice.

It is a pass-by-value slice. Now there is one more slice that provides access to what the argument has been providing access to.

> If it would be a slice, than the underling struct could not be altered, 
> but here it can be.

D2's "slice" is different than some other languages'. :)

> It does not own its elements as it cannot resize the underlying 
> structure without penalty.

Agreed.

> Actually it is a slice + copy_on_resize.

Yes.

> When you modify the elements, 
> it alters the element of the referred array, but when you resize it, it 
> copies (or not, depending on the DMD implementation!!!) the elements 
> into another array and creates a new slice+copy_on_resize object for 
> this array.

That is the "discretionary" part in the semantics.

But, we must find a different entity (GC? druntime?) that makes the copies; becaus that entity is the owner of the elements.

> Thus foo(int[]) is worse than the dangling pointers or buffer overwrite 
> errors from C(C++). Semantically they are correct, so bug produced from 
> resized int[] cannot be detected using tricks like patterns in the 
> memory. It is something that can cause really-really nasty bugs. 
> Especially when the copy of the original array on resizing depend on the 
> dmd implementation. (depends on how much extra datas are allocated for 
> an array before they are really reallocated).
> 
> 
>  From my point of view it's quite rare to have an array that is 
> temporally extended with elements (partially enabling to modify to old 
> ones), then  forget the new ones.

I don't understand the use either. But I can see how it is important for performance.

But we may not know at that time whether the new ones will not be used. The new slice may be copied to another one.

I agree though that I can't see any use case.

> So please don't favor a feature in the 
> core language that is either hardly used and causes bugs, that can be 
> hardly  detected.
> 
> So my proposal is that (as been already mentioned by others):
>   - have array those contains the element + structure,
> 	ex RandomAccessArray!(int) AVLTree
>   - have slices (ranges) of array, where the structure cannot be altered
>   - decide if int[] stands for a short version for either 
> RandomAccessArray!(int) or Range!(int), but DO NOT have a mixed meaning, 
> that can be altered/modified with const/immutable/ref qualifiers.

Good proposals.

My views on the current semantics:

I recently took it a challenge to define the semantics of the current dmd implementation of "dynamic consequtive objects" (I don't want to call them slices or dynamic arrays.) I've posted my views this weeek...

First two objections (not to you, but to the current nomenclature):

1) I disagree that D2 provides dynamic arrays to the programmer. The dynamic nature of the elements are maintained on the background; but the programmers never lay their hands on dynamic arrays.

2) D2's slices are not the same thing as in other languages. Still, I will call what the programmer receives "slices" below

To illustrate, let's have a look at the following definition:

  int[] slice = new int[10];

- side effect: 10 objects are created
- returned value: a slice to all of those objects

It gets interesting:

  int[] slice2 = slice[1..$-1];

Now we have two entities that provide access to the underlying objects. This is a "sharing relationship." In this sense, the two share the access to those objects.

The interesting part is that, either party can leave this relationship at will... As soon as they see unfit, they will go elsewhere and start providing access to copies of these object.

Neither party owns these objects. The garbage collector does.

Because, if we say that 'slice' was the owner and now went away, then is 'slice2' owning the objects? Has it been promoted to a "dynamic array?"

I think not. For that reason, I see no difference between "dynamic arrays" and "slices" in D2. Neither owns the objects; they provide access.

I describe this as "discretionary sharing semantics."

> Some random thoughts:

I am not sure whether the following are your proposals for change, but I tested them with the current implementation and they fit in the semantics as I understand.

> [1,2,3,4] literal is the array itself. It could alter the structure, if
> it would not be an immutable object.
> 
> int[] a = [1,2,3,4] a is a slice of the array, cannot be resized.

It can be resized with 2.036 and fits my definition. As we append objects to 'a', it may get new copies to provide access to.

> int[] a = new int[100]; is a slice too,

Agreed: side effect is 100 element creation, return value is a slice.

> int[100] b;
> int[] a = b; is a short version for a copy: a = RandomAccassArray!(int)(

Disagreed: b is a fixed-sized array and 'a' is a slice that provides access to its objects. 'a' may terminate this sharing contract at will as it sees unfit.

> a.array is the referred array, thus a.array ~= 2 could resize the array,
> 1. the slice a itself is not modified, it still points to the original
> subset - but then what about the removed elements ???
> 2. the slice a is automatically resized to point to the altered structure
> 3. slices of static arrays they cannot be resized

Reading those, I think you've been proposing. My attempt is to define the *current* semantics as of 2.036.

> const int[] cannot resize the underlying array
> int[] can
> 
> int[new] is the array itself (i'm not sure)
> int[new] a = new int[100];

I haven't learned about T[new] yet, but I think it is discontinued. (?)

Ali