[Issue 14467] arr.capacity sometimes erroneously returns 0

Sun Apr 19 22:11:27 PDT 2015

https://issues.dlang.org/show_bug.cgi?id=14467

Jonathan M Davis <issues.dlang at jmdavisProg.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |issues.dlang at jmdavisProg.co
                   |                            |m

--- Comment #2 from Jonathan M Davis <issues.dlang at jmdavisProg.com> ---
(In reply to Ketmar Dark from comment #1)
> it looks right to me. consider this code:
> 
> void main () {
>   auto a0 = new int[10];
>   auto a1 = a0[5..$];
>   a1 ~= 42;
>   a0 ~= 667;
> }
> 
> here if `a1.capacity` will not be 0, the last item of `a1` will become
> `667`, as druntime will reuse memory for `a0`, and that memory is already
> used by `a1`.
> 
> as there is no dataflow analysis, compiler can't tell if `arr` in your case
> is the only ref to the array. so compiler conservatively sets slice capacity
> to `0`, triggering copy-on-append behavior.

It would be the runtime, not the compiler, since it would be done at runtime.
But regardless, your example involves appending to one of the slices, whereas
Steven's does not. As I understand it, the runtime knows what the farthest
point into the memory block is that a slice has referred to is - which in
Steven's example would be 10. It doesn't matter how many arrays refer to that
block of memory, until one of the expands into the free space at the end, the
farthest point used stays the same, and they should all have the capacity to
grow into that space. That should only change once one of the actually uses
some of that space - like in your example. Once that happens, the array which
refers to the farthest point in the block of memory has the remainder of the
block of memory as part of its capacity, whereas the others would not - their
capacity would have to be either their length or 0 - and then when one of them
is appended to, it would have to be reallocated.

But as I understand it, it doesn't matter if more than one array refers to to
the end of the used portion of the memory block. It only matters whether an
array refers to the last portion used. If it doesn't, then it has no capacity
to grow. If it does, then it does, even if other arrays refer to the same
memory. And whichever array grows into that memory is the one that gets it, and
the others will have to be reallocated if they are appended to.

capacity is a calculated property. The arrays themselves only have a ptr and
length property, and the runtime does not keep track of which slices refer to
which memory block. It only keeps track of stuff like the farthest that an
array has grown into it, and capacity is calculated by looking at the array
passed in to the capacity function and looking at the block that it refers to,
not by actually keeping track of the capacity of the array. That's part of why
appending can be expensive, but we're kind of stuck with that without making
the arrays themselves more complicated and doing stuff like making them
reference-counted, which has a different set of pros and cons, but definitely
would be harder to make work with having dynamic arrays being able to refer to
static arrays and malloced buffers and the like, which works just fine right
now.

--