Best way to clear dynamic array for reuse

Thu Jul 14 04:09:53 PDT 2016

On Thursday, July 14, 2016 09:56:02 Miguel L via Digitalmars-d-learn wrote:
> Thank you Jonathan, that really cleared up a lot of things, I
> read the article. But I still have this doubt: is
> assumeSafeAppend() changing a property of the array as "this
> array is never going to be referenced by any other slice, you can
> append or change its length any time and it is never going to be
> reallocated unless it's out of free space"? or it is more like
> "adjust capacity after last operation" so I should be calling it
> whenever I am adjusting length or before appending?

_All_ that a dynamic array is is

struct DynamicArray(T)
{
    size_t length;
    T* ptr;
}

All of properties of a dynamic array are calculated by the GC. So, an
operation like assumeSafeAppend is not doing anything to the array itself
but to the memory block (or rather the metadata associated with the memory
block) that the GC keeps track of. Think of the memory block in the GC as
being something like

struct MemoryBlock(T)
{
    T* start;
    size_t length;
    T* farthestUsed;
}

That's not actually what it looks like, but it should help you understand.
And now let's assume that we somehow have access to this memory block as the
variable memBlock, and you get something like

auto arr = [15, 19, 22, 7, 2];
assert(arr.length == 5);
assert(arr.ptr == memBlock.start);
assert(arr.ptr + arr.length == memBlock.farthestUsed);
assert(arr.capacity == memBlock.length);

If you append to arr, then memBlock.farthestUsed gets adjusted, and you get
something like

arr ~= 42;
assert(arr.length == 6);
assert(arr.ptr == memBlock.start);
assert(arr.ptr + arr.length == memBlock.farthestUsed);
assert(arr.capacity == memBlock.length);

If you then slice arr so that it doesn't refer to the first element in the
memory block, then you'd get something like

arr = arr[1 .. $];
assert(arr.length == 5);
assert(arr.ptr == memBlock.start + 1);
assert(arr.ptr + arr.length == memBlock.farthestUsed);
assert(arr.capacity == memBlock.length - (arr.ptr - memBlock.start));

If you change the array's length so that it has one fewer elements on the
end, then you get something like

--arr.length;
assert(arr.length == 4);
assert(arr.ptr == memBlock.start + 1);
assert(arr.ptr + arr.length + 1 == memBlock.farthestUsed);
assert(arr.capacity == 0);

Because arr.ptr + arr.length != memBlock.farthestUsed, the capacity is now
0. So, appending to arr would then require a reallocation. However, if you
call assumeSafeAppend, then farthestUsed is adjusted, and you get something
like

// Sets memBlock.farthestUsed to arr.ptr + arr.length
arr.assumeSafeAppend();
assert(arr.length == 4);
assert(arr.ptr == memBlock.start + 1);
assert(arr.ptr + arr.length == memBlock.farthestUsed);
assert(arr.capacity == memBlock.length - (arr.ptr - memBlock.start));

So, the main thing that assumeSafeAppend has done is adjust the field that
goes with the memory block which indicates the farthest point in the memory
block that a dynamic array has ever grown to. The GC doesn't try and figure
out which dynamic arrays might currently refer to that memory. It has no
clue and doesn't care. During a collection, it'll check whether anything
points to that memory block and free it if nothing does, but during normal
operations, no attempt is made to determine how many dynamic arrays refer to
a particular memory block or where in that memory block they might point.
All the GC needs to keep track of in order to avoid having arrays stomp on
one another when they grow is the farthest that any dynamic array has grown
into that memory block. And assumeSafeAppend is telling the GC to change
that value to point to the last element in the array that you call it on
rather than wherever it pointed to before.

So, if there actually are any other dynamic arrays referring to the memory
past the end of the array that you called assumeSafeAppend on, then you've
screwed them up, and growing that array will stomp on them, causing bugs.
However, if there really aren't any other dynamic arrays referring to the
memory past the end of that array, then you're fine. Appending won't require
a reallocation until there is no more free memory in the block for the array
to grow into.

However, if you ever change the length of the array again, taking elements
off the end, once again, that array's capacity will be 0, and appending to
it will reallocate.

So, if you want to remove an element from the end of an array and then
append without reallocating, you will need to call assumeSafeAppend after
_every_ time that you reduce the array's length. e.g.

--arr.length;
arr.assumeSafeAppend();

And that will work great as long as there are no other dynamic arrays
referring to the element that was just removed from the array. If you have
something like

auto arr2 = arr;
--arr.length;
arr.assumeSafeAppend();

then that's very bad, because if you then append to arr, then it will be
overwriting the last element in arr2. Additionally, if assumeSafeAppend
results in the destructor being called on the element that was removed from
the array (I'm not sure if it does or doesn't), then accessing the last
element of arr2 without appending to arr would result in an unsafe
operation, because that element would be in an invalid state.

So, shriking an array and calling assumeSafeAppend to be able to grow the
array without reallocating (until the memory buffer is full anyway) is fine
- but only so long as you make sure that you don't have any other dynamic
arrays referring to the memory past the end of that array.

- Jonathan M Davis