A couple of questions about arrays and slices

Wed Jun 21 04:52:06 UTC 2023

On Tuesday, June 20, 2023 8:09:26 PM MDT Cecil Ward via Digitalmars-d-learn 
wrote:
> First is an easy one:
>
> 1.) I have a large array and a sub-slice which I want to set up
> to be pointing into a sub-range of it. What do I write if I know
> the start and end indices ? Concerned about an off-by-one error,
> I have start_index and past_end_index (exclusive).

When slicing, the end index is exclusive. e.g.

    auto a = "0123456789abcdef";
    assert(a[3 .. 8] == "34567");

As a consequence of that, the end index minus the start index is the length
of the resulting slice. It also means that $ always refers to one past the
end of the array.

> 2.) I have a dynamic array and I wish to preinitialise its alloc
> cell to be a certain large size so that I don’t need to
> reallocate often initially. I tell myself that I can set the
> .length property. Is that true?

Well, dynamic arrays have a length and a capacity. The length is the actual
length of the slice that you're operating on. e.g.

auto a = new int[](100);
assert(a.length == 100);

or

int[] a;
a.length = 100;
assert(a.length == 100);

However, the underlying memory that the dynamic array is a slice of will
generally be larger than the length of the array - in part because of how
the allocator works and in part so that appending to the array will be more
efficient. So, you can see how much space the dynamic array has to grow
before it has to be reallocated when appending to it by using the capacity
property. E.G. When I run this on my system

auto a = new int[](100);
writeln(a.length);
writeln(a.capacity);

I get

100
127

This means that I could append 27 more elements to the array (or otherwise
increase its length by up to 27 elements) without needing a reallocation.
But if the array tries to grow beyond that, then the GC will allocate a new
block of memory, copy the elements over to that, and adjust the slice to
point to the new block of memory (of course leaving any other slices
pointing to the old block of memory).

One thing to note about that is that a dynamic array can only have a
capacity that's larger than its length if it's the furthest slice into that
block of memory, and no other slices have gone further into it (since it's
only safe for the runtime to allow you to append to a dynamic array in-place
when no other dynamic array can possibly refer to the memory after the array
that you're trying to append to). If the dynamic array is not at the end,
then it ends up with a capacity of 0. E.G.

auto a = new int[](100);
auto b = a[0 .. $ - 1];
assert(b.length == 99);
assert(b.capacity == 0);

So, capacity is a bit more complex than the max length that the array could
grow to without needing a reallocation, but you can use it see if appending
to a dynamic array will result in a reallocation.

Another part of this that you of course need to remember is that dynamic
arrays can refer to memory that is not GC-allocated for dynamic arrays (be
it stack-allocated or malloc-allocated or even GC-allocated for a different
purpose), in which case, the capacity will be 0, because the underlying
memory block is not a GC-allocated memory block for dynamic arrays, and
appending anything to the array then has to result in a reallocation so that
it then does point to a GC-allocated block of memory for dynamic arrays.

If you want to create a dynamic array of a specific length while also
ensuring that that there is at least a specific amount of memory in the
underlying memory block for it to grow into, then you need the reserve
function. It allows you to tell the GC how much memory you would like to be
allocated underneath the hood. E.G.

int[] a;
a.reserve(500);
assert(a.length == 0);
writeln(a.capacity);

prints 511 on my system.

https://dlang.org/spec/arrays.html#capacity-reserve

All that being said, if you're trying to minimize reallocations when
appending to a dynamic array, you should consider using
https://dlang.org/phobos/std_array.html#Appender since it has some other
tricks to make it so that it queries the GC less and speeds the whole
process up. But the basic idea of reserving capacity is the same, and when
you're done appending, you get a normal dynamic array out of the deal.

> 2a.) And what happens when the cell is extended, is the remainder
> zero-filled or remaining full of garbage, or is the size of the
> alloc cell something separate from the dynamic array’s knowledge
> of the number of valid elements in it ?

A dynamic array is basically this:

struct DynamicArray(T)
{
    size_t length;
    T* ptr;
}

It's just a slice of memory and has no clue whatsoever about what it points
to. All of the logic for that comes from the druntime functions that you
call to operate on dynamic arrays (e.g. ~= or capacity). It could be a slice
of a static array, GC-allocated memory, malloced memory, etc. And the GC
doesn't do anything to keep track of all of the dynamic arrays. They're
basically just simple structs sitting on the stack or inside of the memory
of whatever data structure they're a part of. To know what to free, the GC
just scans them along with everything else when it does a collection.

Now, if the memory that the dynamic array points to is a block of
GC-allocated memory allocated for dynamic arrays (as it would be if you used
new or assigned a value to its length property), then there are some minor
bookkeeping items at the end of that memory block IIRC. In particular, the
GC needs to know the furthest into that block of memory that any dynamic
array has referred to so that it can know whether it's safe to append to a
dynamic array referring to that block of memory. But all of that is
implementation-specific stuff that you don't really need to worry about.

Now as for garbage, no @safe code will ever have garbage in any memory that
it can access (barring @trusted being misused anyway). Whenever you
explicitly increase the length of a dynamic array rather than appending to
it, the new elements are all initialized with the element type's init value
- which is why you can't have dynamic arrays of structs that @disable their
init value.

If what you're worried about is the memory in the memory block that the
dynamic array is a slice of which is beyond the current length of the array,
then IIRC, it's just zeroed out no matter what the type is, but I'd have to
go digging in druntime to be sure. Either way, you can't access any of that
memory in @safe code (it would require using pointers with pointer
arithmetic rather than slices to access it).

Actually, a quick test should show us quite easily, and on my system

int[] a;
a.reserve(10);
writeln(*a.ptr);
writeln(*(a.ptr + 1));
writeln(*(a.ptr + 2));
writeln("----");
a ~= 47;
writeln(*a.ptr);
writeln(*(a.ptr + 1));
writeln(*(a.ptr + 2));

printed

655142960
8
609419264
----
47
8
609419264

on one run, and

682106928
8
634187776
----
47
8
634187776

on another. So, the memory isn't being zeroed out, and it looks like it's
probably garbage, though the fact that the second spot is 8 in both cases
implies that the GC is doing something rather than it all just being
garbage. Regardless, you don't really have to worry about how valid that
memory is, since you can't access it in @safe code, and if it _does_ ever
get initialized with anything, it'll just be bit-blitted.

And of course, if your main concern here is appending speed, then you should
probably be using Appender rather than directly appending to a dynamic
array.

Hopefully all of that answers your questions or at least helps clarify
things.

- Jonathan M Davis