resizeable arrays: T[new]

Sun Jun 10 06:03:37 PDT 2007

Oskar Linde wrote:
> Bruno Medeiros skrev:
>> Walter Bright wrote:
>>> Sean Kelly wrote:
>>>> Walter Bright wrote:
>>>>> (The problem is if I pass a slice to a function, and then that 
>>>>> function reallocates the slice by changing the length or appending 
>>>>> to it, it can affect other slices into the same data in very 
>>>>> hard-to-explain ways.
>>>>
>>>> I'm not sure I understand how a reallocation could affect other 
>>>> slices into the same data.  If a reallocation occurs, that 
>>>> reallocation will be into entirely new memory, provided the slice 
>>>> doesn't start at the head of a memory block.  Is it the condition I 
>>>> just mentioned that's a problem?  If so, I suppose it is one way to 
>>>> trick the compiler into thinking the reference no longer refers to 
>>>> const data, when in fact it does.  If not, could you point me in the 
>>>> right direction?
>>>
>>> Given:
>>>
>>>     int[] a = new int[7];
>>>     int[] b = a[1..6];
>>>     b[1] = 2;   // writes through to a[2]
>>>     b.length = 4;
>>>     b[1] = 3;   // writes through to a[2]
>>>     b.length = 6;
>>>     b[1] = 4;   // does not write through to a[2]
>>>
>>
>> So, is that the problem we are trying to solve? Why not just simply 
>> make it so that setting .length *allways* creates a new array?
> 
> Not really an option. That would make ~= O(n^2) instead of O(n).
> 
> 

Hum, didn't recall that. However, I still wouldn't say it's not an 
option. You're asking to choose between performance (significant only in 
some cases, like realloc'ing in loops) and correctness/"safeness". 
Safeness because length setting is not safe in the general case, but 
only if you know who and how the array was allocated, and also that no 
other references to it exist (in the case you are increasing the 
length). It seems to me that it should be left to the compiler, through 
code analysis, whether the in-place realloc optimization can be safely 
performed or not (i.e., without changing program semantics).
However, since this would take quite some work to implement in the 
compiler, and also since some people may like to know for sure if their 
optimization is implemented or not, we could have a syntax (alternative 
to setting the length), that would do an inplace realloc. It could be 
another property (ar.inlength = 42), or simply a method 
(ar.realloc(42)), or whatever other syntax.
This would mean that concatenation operators would always dup the array, 
so when you would want to resize inplace you couldn't use the 
concatenation operators. That would be, I think, the only difference 
versus the two array types solution. But given the rarity of inplace 
resizing, does not being able to use the concatenation operators justify 
adding a new array type to the language?

>> Hum, now that I think about it, this may not only be a better 
>> solution, it may be the *only* real solution. Even, with the new 
>> resizable/unresizable array proposal you can have the same problem as 
>> above:
>>
>>   int[new] a = new int[7];
>>   int[new] b = a;
>>   b.length = 6;
>>
>>   b[1] = 2;   // writes through to a[1]
>>   b.length = 4;
>>   b[1] = 3;   // writes through to a[1]
>>   b.length = 10;
>>   b[1] = 4;   // does not write through to a[1]
>>
>> Why does this problem exist? Because, setting an array length may or 
>> may not create a new array, and there isn't a way to know which will 
>> happen.  Well, in that code segment we can see when it will resize, 
>> but we can't know in the general case (like when we receive an array 
>> as a function parameter). This makes setting .length as very unsafe 
>> operation, if not plain *broken*.
> 
> That's why my suggestion was making T[new] a reference type instead of a 
> value type, just like T[U] has become.
> 
> The associative array used to have a similar issue. Previously (before D 
> 0.151):
> 
> int[int] a = ...;
> int[int] b = a;
> b[0] = 0;
> 
> could potentially corrupt a...
> 
> 
> And an analogue for todays dynamic array:
> 
> int[] a = ...;
> int[] b = a;
> b.length = 1;
> b ~= ...;
> 
> will corrupt a.
> 
> 
> The solution would be identical: Make the resizable array a reference 
> type. Todays value type behavior for dynamic arrays are perfect for 
> slices, just not for resizable arrays.
> 
> Another option, if one wishes to resolve the above issues and have 
> T[new] remain a value type, would be to insert an implicit .dup on 
> assignment for T[new]. That way, there would never be two resizable 
> arrays sharing the same ptr.
> 
> /Oskar

Yeah, I guess those two would also be solutions, although I don't see 
right now if there could be some other implications with those usages.

But in any case, even having two array kinds, and whatever the way the 
resizable array kind works, why do the non-resizable arrays need to be 
non-resizable at all? They could be resizable, but having the resize 
operation always allocate a new array (and consequently also allow 
concatenation). You gain language functionality and lose *nothing*.

-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D