Can assumeSafeAppend() grab more and more capacity?

Wed Jun 7 12:57:22 PDT 2017

On 6/7/17 3:56 AM, Biotronic wrote:
> On Wednesday, 7 June 2017 at 05:43:06 UTC, ag0aep6g wrote:
>> [snip]
>
> It seems to me this is a topic worthy of a more in-depth article. If
> only I felt up to that. :p

Your understanding and explanation is excellent actually!

>
> When you create a slice 'a' in D (with the current GC and druntime, at
> least), what happens behind the scenes is the allocator chops off a
> block of N bytes, where N is some number larger than
> a.length*typeof(a[0]).sizeof. For an array of two ints, N is 16.
> For good measure, let's make a copy 'b' of that slice (it will come in
> handy later):
>
> int[] a = [1, 2];
> int[] b = a;
>
> import std.stdio;
> writeln(a.capacity);
>> 3
> writeln(b.capacity);
>> 3
>
> The capacity is 3. Intriguing, as a block of 16 bytes should be able to
> hold 4 ints.
>
> We can ask the GC for more info on this block:
>
> import core.memory;
> auto info = GC.query(a.ptr);
> writefln("0x%x, %s, %s ", info.base, info.size, info.attr);
>> 0x2211010, 16, 10
>
> That's the pointer to the start of the block, the size of the block, and
> various attributes (appendable, e.g.).
> We can get the raw data of the block:
>
> auto block = (cast(ubyte*)info.base)[0..info.size];
> writeln(block);
>> [1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8]
>
> We can see our 1 and 2 in there, and a curious 8 at the end. That's the
> currently used data, in bytes. That's also the reason the capacity is 3,
> not 4 - this info has to live somewhere. If we were to append another
> element, and print the data again:
>
> a ~= 3;
> writeln(block);
>> [1, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 12]
>
> See how the last byte changed to a 12? That just so happens to be
> a.length*int.sizeof.

To be more specific, for blocks <= 256 bytes, 1 byte is reserved at the 
end of the array to store the length. For blocks > 256 bytes and <= 2048 
bytes, 2 bytes are reserved at the end of the block to store the length 
of the array. For larger blocks, those are PAGE size and larger, and 
they have a special feature. Such blocks are not limited to a power of 
2, and can be extended literally in-place by tacking on additional 
PAGEs. I wanted to just put a size_t at the end, but my problem with 
this is that the length then would move around as you appended or shrunk 
blocks. Given how the runtime works, it's possible that 2 threads could 
be potentially appending at the same time to a shared array, so I 
decided to store it at the beginning of the block instead.

I would actually like to replace this mechanism with one that stores the 
length outside the block and into a separate memory space, as it is 
horrible for caches. Allocate a page of bytes, and you actually get 2 
pages - size_t.sizeof.

Note, for types with destructors, more bytes are reserved to store the 
type info of the array elements. I didn't write that part, so I'm not 
100% sure how it works.

> Now for a curious thing: what happens to a's capacity when we append to b?
>
> b ~= 4;
> writeln(a.capacity);
>> 3
>
> As above, the length of a in bytes equals the used bytes in the
> allocated memory block, and so both a and b have capacity again.

Yes, for this reason, calling assumeSafeAppend is unsafe and can *never* 
be part of the normal treatment of arrays. It is on you, the programmer, 
to ensure that no references to the no-longer-allocated portion of the 
array exist. The compiler can't ensure it, a library function can't 
ensure it, and they are similar to dangling pointers.

Only a library type which encapsulates the array completely can use 
assumeSafeAppend.

For example, imagine if these are immutable arrays, you have now 
overwritten immutable data!

-Steve