Best way to clear dynamic array for reuse

Thu Jul 14 02:12:50 PDT 2016

On Thursday, July 14, 2016 07:07:52 Miguel L via Digitalmars-d-learn wrote:
> Ok, i have read about Appender and assumeSafeAppend(), but i am
> still a bit confused.
> What i have understood is: dynamic arrays are (almost) always
> reallocating when appending to them except assumeSafeAppend() is
> used, or when wrapping them with Appender. So, if i'm sure there
> are no slices referencing my array i can use assumeSafeAppend().
> But is this permanent? I mean if I declare:
>   array A[] x;
>   x.assumeSafeAppend();
>
> Does that last forever so I can append without reallocating after
> emptying array x, or should I call assumeSafeAppend() every time
> I adjust x.length or append to x?
>
> Maybe I should give up trying to use dynamic arrays and use fixed
> length arrays instead.

If you haven't read it yet, I'd suggest reading

http://dlang.org/d-array-article.html

A dynamic array is basically

struct DynamicArray(T)
{
    size_t length;
    T* ptr;
}

So, it does not own or manage its own memory. It can refer to any memory,
but it's the GC that manages determining whether appending would need to
reallocate or not. And if it does reallocate, then the memory is going to be
GC-allocated regardless of whether it was GC allocated originally or
malloc-ed memory or a slice of a static array or whatever.

When the GC allocates a block of memory for a dynamic array, it keeps track
of the farthest point in that block of memory that any dynamic array refers
to. If the last element in a dynamic array is the last element in that
memory block, and there is additional capacity beyond that within the memory
block, then appending will not result in a reallocation. Rather, it will
expand that dynamic array into the free space in the memory block. But if
that memory block is fully used or if the last element in the dynamic array
is not the last used element in the memory block (or if the dynamic array
refers to memory that was not allocated by the GC), then there is no room
for that array to be expanded, and a reallocation will occur, at which point
that dynamic array _will_ point to the last used element in its new memory
block. So, if you doing something like

int[] arr;
arr ~= 10;
arr ~= 9;
arr ~= 42;
arr ~= 17;
arr ~= 42;
arr ~= 99;
arr ~= 0;
arr ~= 100;

it could be that only one of those append operations actually result in an
allocation (the first one), or it could be that multiple do. How many of
them require a reallocation depends on what size the underlying buffer is,
and that's implementation dependent. If you want to know how many elements
can be appended without reallocating, then use capacity. For instance, on my
machine, this code

import std.stdio;

void main()
{
    int[] arr;
    writefln("len: %s, cap: %s", arr.length, arr.capacity);
    arr ~= 10;
    writefln("len: %s, cap: %s", arr.length, arr.capacity);
    arr ~= 9;
    writefln("len: %s, cap: %s", arr.length, arr.capacity);
    arr ~= 42;
    writefln("len: %s, cap: %s", arr.length, arr.capacity);
    arr ~= 17;
    writefln("len: %s, cap: %s", arr.length, arr.capacity);
    arr ~= 42;
    writefln("len: %s, cap: %s", arr.length, arr.capacity);
    arr ~= 99;
    writefln("len: %s, cap: %s", arr.length, arr.capacity);
    arr ~= 0;
    writefln("len: %s, cap: %s", arr.length, arr.capacity);
    arr ~= 100;
    writefln("len: %s, cap: %s", arr.length, arr.capacity);
}

prints

len: 0, cap: 0
len: 1, cap: 3
len: 2, cap: 3
len: 3, cap: 3
len: 4, cap: 7
len: 5, cap: 7
len: 6, cap: 7
len: 7, cap: 7
len: 8, cap: 15

So, you can see that a reallocation occurred on the 1st, 4th, and 8th append
operations, and that won't reallocate again until the 16th append operation
(since it can grow to a length of 15 before a reallocation would be
required). The capacity grows in a manner similar to that of std::vector in
C++ or ArrayList in Java. And it's reasonably efficient in terms of memory
allocations (amortized O(1)). The reason that Appender often gets used is
that checking the capacity of the dynamic array is not cheap (since that's
kept track of by the GC and not the dynamic array itself), and that has to
be checked every time that you append. Appender is able to play some games
to make that more efficient under the assumption that what you're doing is
simply appending to build an array after which you would take it out of the
Appender and not use Appender anymore. But Appender does not change the
allocation scheme. It just makes the checks more efficient. So, use Appender
when you're first building an array, but don't keep using it after that.

What gets more entertaining in terms of capacity and reallocations is when
you start slicing the array or changing its length. For instance, if we add

    arr.length = arr.length - 1;
    writefln("len: %s, cap: %s", arr.length, arr.capacity);

to the end of the previous example, then you get

len: 0, cap: 0
len: 1, cap: 3
len: 2, cap: 3
len: 3, cap: 3
len: 4, cap: 7
len: 5, cap: 7
len: 6, cap: 7
len: 7, cap: 7
len: 8, cap: 15
len: 7, cap: 0

Notice that the capacity is now 0. That's because the dynamic array no
longer refers to the last, used element in the underlying memory buffer.
Similarly, if you instead did

import std.stdio;

void main()
{
    int[] arr;
    writefln("arr len: %s, cap: %s", arr.length, arr.capacity);
    arr ~= 10;
    writefln("arr len: %s, cap: %s", arr.length, arr.capacity);
    arr ~= 9;
    writefln("arr len: %s, cap: %s", arr.length, arr.capacity);
    arr ~= 42;
    writefln("arr len: %s, cap: %s", arr.length, arr.capacity);
    arr ~= 17;
    writefln("arr len: %s, cap: %s", arr.length, arr.capacity);
    arr ~= 42;
    writefln("arr len: %s, cap: %s", arr.length, arr.capacity);
    arr ~= 99;
    writefln("arr len: %s, cap: %s", arr.length, arr.capacity);
    arr ~= 0;
    writefln("arr len: %s, cap: %s", arr.length, arr.capacity);
    arr ~= 100;
    writefln("arr len: %s, cap: %s\n", arr.length, arr.capacity);

    auto arr2 = arr;
    writefln("arr len: %s, cap: %s", arr.length, arr.capacity);
    writefln("arr2 len: %s, cap: %s\n", arr2.length, arr2.capacity);

    auto arr3 = arr[0 .. $ - 1];
    writefln("arr len: %s, cap: %s", arr.length, arr.capacity);
    writefln("arr2 len: %s, cap: %s", arr2.length, arr2.capacity);
    writefln("arr3 len: %s, cap: %s\n", arr3.length, arr3.capacity);

    arr2 ~= 77;
    writefln("arr len: %s, cap: %s", arr.length, arr.capacity);
    writefln("arr2 len: %s, cap: %s", arr2.length, arr2.capacity);
    writefln("arr3 len: %s, cap: %s", arr3.length, arr3.capacity);
}

it prints out

arr len: 0, cap: 0
arr len: 1, cap: 3
arr len: 2, cap: 3
arr len: 3, cap: 3
arr len: 4, cap: 7
arr len: 5, cap: 7
arr len: 6, cap: 7
arr len: 7, cap: 7
arr len: 8, cap: 15

arr len: 8, cap: 15
arr2 len: 8, cap: 15

arr len: 8, cap: 15
arr2 len: 8, cap: 15
arr3 len: 7, cap: 0

arr len: 8, cap: 0
arr2 len: 9, cap: 15
arr3 len: 7, cap: 0

Notice that only whichever dynamic arrays refer to the last element within
that memory block has a non-zero capacity. So, appending one of those won't
result in a reallocation so long as the memory block isn't full, but
appending to any of the others _will_ result in a reallocation, and if you
append to a dynamic without reallocating, any other dynamic arrays which are
slices of the same memory block and which had a non-zero capacity will no
longer have a non-zero capacity, because they no longer refer to the last,
used element in the underlying memory block.

So, if what you're doing is passing around dynamic arrays which are slices
of one another left and right and appending to them will-nilly, then you're
going to get a lot of reallocations, but if you're just appending to dynamic
arrays which refer to the last element in their memory block, and you don't
append to any of the other dynamic arrays which are slices of that same
block, then you'll get only occasional reallocations.

But it sounds like what you're trying to do is something like

    auto arr = [99, 45, 33, 22, 19, 46];
    writefln("arr len: %s, cap: %s", arr.length, arr.capacity);

    arr.length -= 2;
    writefln("arr len: %s, cap: %s", arr.length, arr.capacity);

    arr ~= 99;
    writefln("arr len: %s, cap: %s", arr.length, arr.capacity);

which on my machine results in

arr len: 6, cap: 7
arr len: 4, cap: 0
arr len: 5, cap: 7

It reallocated 99 was appended, because arr didn't refer to the last used
element in the memory block. To fix that, you use assumeSafeAppend. e.g.

    auto arr = [99, 45, 33, 22, 19, 46];
    writefln("arr len: %s, cap: %s", arr.length, arr.capacity);

    arr.length -= 2;
    writefln("arr len: %s, cap: %s", arr.length, arr.capacity);

    arr.assumeSafeAppend();
    writefln("arr len: %s, cap: %s", arr.length, arr.capacity);

    arr ~= 99;
    writefln("arr len: %s, cap: %s", arr.length, arr.capacity);

which prints

arr len: 6, cap: 7
arr len: 4, cap: 0
arr len: 4, cap: 7
arr len: 5, cap: 7

on my machine. It didn't reallocate. Instead, assumeSafeAppend told the GC
that the last element that arr refered to within its memory block was the
last used element in the memory block. So, suddenly, all of the space after
arr was available, and its capacity was non-zero. So, if you're going to be
doing a bunch of adjustments to length to remove elements, then you can use
assumeSafeAppend to then make it so that the GC understands that the
elements after that array are non longer used and that appending can grow
the array into that space, which will significantly reduce the number of
reallocations in the case where you keep removing elements from the end of
the array. _However_, the very large caveat is that if you do this, you
cannot have any other dynamic arrays which refer to any elements past the
end of arr, because if such dynamic arrays do exist, then suddenly, their
values are gonig to be stomped on by the append operations to arr (and it
might even be that those elements had their destructors called when
assumesafeAppend was called - I don't know; if so it's that much worse if
you tell it that it's safe to append when it actually isn't).

So, whether you should be using Appender or assumeSafeAppend or neither
depends entirely on what you're doing. However, in general, simply appending
to dynamic arrays does not result in many reallocations (just like it
doesn't result in a lot of realloctions for std::vector or ArrayList). When
reallocations become a problem is when you start slicing a dynamic array so
that you have other dynamic arrays which refer to the same memory, and you
append to those dynamic arrays, or when you reduce the length of an array
and then append to it, because in both of those cases, you're appending to
dynamic arrays which do not refer to the last element in their underlying
memory block.

Hopefully, that makes things at least somewhat clearer.

- Jonathan M Davis