Does the GC consider slice lengths?

Fri Apr 1 21:11:33 UTC 2022

On Fri, Apr 01, 2022 at 01:15:01PM -0700, Ali Çehreli via Digitalmars-d-learn wrote:
> On 4/1/22 10:46, Steven Schveighoffer wrote:
> 
> >> But if it considers slices and their lengths specially, then it
> >> would see that the mid section is not referenced any more.
> >
> > It's not a consideration for the GC, but possibly the compiler. The
> > GC has no visibility into slices that hold references to its memory
> > block.  It only has knowledge of pointers to its memory block (and
> > then only indirectly via scanning).
> 
> Yes but if it recognized slices in addition to pointers, it could
> consider their lengths to reuse the middle section, no? I understand
> the role of the compiler but as GC is part of the D runtime, it feels
> like it could own the concept of slices.
[...]

The GC trades in blocks of memory.  Being conservative, it makes no
assumptions about the semantics of pointers that it encounters when it
scans for references, other than the fact that it's referencing an
allocated block of memory. It doesn't know what a slice is -- which, in
D, is stored as a struct containing a pointer and a length: the GC makes
no assumptions about the meaning of the length value that just happens
to occur adjacently to the pointer. It could be part of a slice, or it
could be a completely unrelated variable that has no relation to the
pointer; such values are disregarded.

In fact, unless precise scanning is enabled, the GC doesn't even know
which values are pointers and which are integers; it just conservatively
assumes that every integer-like value is a potential pointer. Only if
they don't look like a valid address, or point to something outside the
GC heap, or to an unallocated block, then the GC will ignore them.  This
is why in 32-bit programs these "false pointers" can sometimes cause the
GC not to collect blocks that are actually already unreferenced (you
happen to have an integer variable whose value coincidentally looks like
a pointer into the block).

So as long as one reference to the allocated block still exists, the GC
will not collect it. It cannot tell whether the program might have
stored an index into an array inside this block in an integer variable
that will later be added to the remaining reference; the GC would have
no way of telling which integer variable this might be, so it must
conservatively assume that as long as a reference to the block still
exists, the entire block is still alive (other parts of the block may
still be accessed by adding an index value, etc.).

If you really only intend on keeping a small chunk of an existing large
allocated block, one solution is to .dup or .idup the slice you want to
keep, and null all other references to the old block (or let them go out
of scope).

//

Of course, one could argue that a more precise scanning could help the
GC minimize blocks whose sole remaining reference is a slice that covers
only a small subset of the entire block. This would involve significant
complications to the scanning algorithm, though, which may have
performance implications. E.g., what constitutes a slice? Only built-in
slices? So pointer + separately-kept index is not allowed? Then we have
a potential buffer overrun bug in code that's actually not buggy, if
some code holds a pointer to the block with some separate index value,
then the GC comes along and thinks it can free the part of the block
that the index indirectly refers to.

I'm sure someone can come up with a way to solve this problem, but the
question is whether the payoff is worth the likely performance hit to GC
collection times.

T

-- 
Unix was not designed to stop people from doing stupid things, because that would also stop them from doing clever things. -- Doug Gwyn