Can assumeSafeAppend() grab more and more capacity?

Wed Jun 7 00:56:55 PDT 2017

On Wednesday, 7 June 2017 at 05:43:06 UTC, ag0aep6g wrote:
> [snip]

It seems to me this is a topic worthy of a more in-depth article. 
If only I felt up to that. :p

When you create a slice 'a' in D (with the current GC and 
druntime, at least), what happens behind the scenes is the 
allocator chops off a block of N bytes, where N is some number 
larger than a.length*typeof(a[0]).sizeof. For an array of two 
ints, N is 16.
For good measure, let's make a copy 'b' of that slice (it will 
come in handy later):

int[] a = [1, 2];
int[] b = a;

import std.stdio;
writeln(a.capacity);
> 3
writeln(b.capacity);
> 3

The capacity is 3. Intriguing, as a block of 16 bytes should be 
able to hold 4 ints.

We can ask the GC for more info on this block:

import core.memory;
auto info = GC.query(a.ptr);
writefln("0x%x, %s, %s ", info.base, info.size, info.attr);
> 0x2211010, 16, 10

That's the pointer to the start of the block, the size of the 
block, and various attributes (appendable, e.g.).
We can get the raw data of the block:

auto block = (cast(ubyte*)info.base)[0..info.size];
writeln(block);
> [1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8]

We can see our 1 and 2 in there, and a curious 8 at the end. 
That's the currently used data, in bytes. That's also the reason 
the capacity is 3, not 4 - this info has to live somewhere. If we 
were to append another element, and print the data again:

a ~= 3;
writeln(block);
> [1, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 12]

See how the last byte changed to a 12? That just so happens to be 
a.length*int.sizeof.

Remember how we made a copy of the slice above? The copy's 
capacity is now 0, while a's capacity is 3. The algorithm for 
capacity is actually pretty simple:

int capacity;
if (a.length*int.sizeof == block[$-1])
     capacity = (block.length - 1) / int.sizeof;
else
     capacity = 0;
writeln(capacity);
> 3

What happens when we call assumeSafeAppend?

b.assumeSafeAppend;
writeln(block);
> [1, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 8]

Hey, the 'used' byte is 8 again. That means a.capacity is 0, 
while b.capacity is 3.

Now for a curious thing: what happens to a's capacity when we 
append to b?

b ~= 4;
writeln(a.capacity);
> 3

As above, the length of a in bytes equals the used bytes in the 
allocated memory block, and so both a and b have capacity again.

This has of course overwritten a[2], which used to be 3 and is 
now 4. assumeSafeAppend breaks part of D's type system for 
optimization purposes, and this is the result.

Note that the details in this post are only correct for small 
blocks (<=256 bytes). For larger blocks, the 'used' field is 
larger, but the algorithms and concepts are the same.

For the actual implementation of a.capacity, you can have a 
look-see at 
https://github.com/dlang/druntime/blob/master/src/rt/lifetime.d#L734, which is called from https://github.com/dlang/druntime/blob/master/src/object.d#L2968.

--
   Biotronic