GC BlkAttr clarification. Programming in D pages 671, 672. About GC

H. S. Teoh hsteoh at qfbox.info
Wed Sep 3 23:05:45 UTC 2025


On Wed, Sep 03, 2025 at 07:56:03PM +0000, Brother Bill via Digitalmars-d-learn wrote:
> It appears that D has multiple ways of determining which created
> objects are subject to GC.

No.  The GC knows which memory address ranges it manages, and any
pointer that fall outside of those ranges will be ignored, since they
cannot possibly be GC-allocated.


> For class instances, new creates a new class instance, returning a
> pointer to it.  While "alive", other pointers may also point to the
> class instance.  Once no pointers are pointing to this class instance,
> it may be garbage collected.

Correct.


[...]
> C, C++ and D can play shenanigans with pointers, such as casting them
> to size_t, which hides them from the GC.

D's current GC is conservative, meaning that any value it sees that
looks like it might be a pointer value, will be regarded as a pointer
value.

There is an optional precise GC that has been implemented, that can be
turned on with compiled-in options or command line options, which uses a
slightly less conservative scheme.


[...]
> GC.calloc can allocate memory for a slice of MyClass instances.  The
> developer may run GC.free to free the allocated memory.  But GC may
> perform its own garbage collection of GC allocated memory blocks.

I might also add that D's GC *does not run* unless the user program
tries to allocate GC memory. Unlike languages like Java, where the GC
may be implicitly triggered by the runtime in the background, D's GC
lies dormant until you perform a GC allocation and it decides that there
isn't enough free memory left and it's time to run a collection cycle.

IOW, if you don't want the GC to run, simply don't allocate any more GC
memory, and no collection cycles will run (unless you call it yourself
via GC.collect).


> Let's look at each attribute:  (confirm if my analysis is right,
> otherwise correct)
> 
> FINALIZE - just before GC reclaims the memory, such as with GC.free,
>            call destructors, aka finalizers.

This bit is probably best left untouched by user code, and left to the
runtime to figure out when/how to use it.


> NO_SCAN - There may be false positives regarding byte values that look
> like 'new' allocated pointers.  This can result in 'garbage' memory
> not being collected.  If we are CERTAIN that this memory block doesn't
> contain any pointers to 'new' SomeClass allocated memory, then mark as
> NO_SCAN.

Correct.  Though if you're writing idiomatic D code, you'll almost never
need to worry about this.  Whenever you allocate an array whose elements
are PODs (without any pointers), the allocator will automatically mark
the memory NO_SCAN so that the GC doesn't waste time scanning such
blocks.  So things like implicit string allocations will be marked
NO_SCAN, etc.  If you're allocating an array or object that contains
indirections, then NO_SCAN will not be set, so the GC will scan the
interior of suc blocks for pointers to other live objects.


>           Question 1: if GC-calloc has allocated MyClass that has a
>           string 'name' member, which may expand in size, would be
>           still properly apply NO_SCAN.

You need to understand that string members are pointer/size pairs.  The
content of the string is never stored inside the object's memory block
itself.

An empty string does not have any associated allocated memory, and when
you assign to the string, whether or not it will be scanned depends on
where it came from.  If it came from a string literal, it will be in the
program's static memory, and the GC never scans that (neither does it
have any GC flags like NO_SCAN).  If the string comes from GC-allocated
memory, then that memory by default will have been marked NO_SCAN
because string data is assumed not to contain any pointer values, only
character values. (This is why it's a bad idea to mask a pointer by,
e.g., converting it into a string representation and storing it inside a
string.  Because the GC won't scan such strings, it may mistakenly
collect a live object thinking that it's dead, if the only references to
the object are inside such strings.)


>           Question 2: if GC-calloc has allocated MyClass, which may
>           allocate new MyStudent(...), would that mean 'don't apply
>           NO_SCAN'?

It's very simple.  If a memory block may contain pointers, then it
should not be NO_SCAN.  If a memory block never contains any pointers,
then it can (should) be marked NO_SCAN.

When does a memory block contain pointers?  When the object that lives
in it contains references, such as references to other objects, to
GC-allocated strings, arrays, etc..  If the object only contains PODs,
then there are no pointers and it may be safely marked NO_SCAN.

But again, I'd like to repeat that user code rarely needs to bother with
NO_SCAN or other GC flags.  The default implementation of `new` will
automatically do the right thing for you.  You only need to fiddle with
GC flags if you're doing something unusual, like emplacing an object in
memory you manually allocated (as opposed to memory allocated by `new`),
or if you're dealing with void[] arrays (possibly constructed
externally) where the runtime doesn't know the actual type of the data.

Normal D code does not need to fiddle with GC flags.


> NO_MOVE - For GC.realloc, if increasing memory allocated, and it's not
> available, throw 'MEMORY_NOT_AVAILABLE' exception.

Correct. You might want to use this flag if you have non-D code that
might be holding pointers to this memory block, e.g., if you passed a
pointer to some D array to C code which retains it in some C-managed
pointer, and the C code expects the array to still be there later.

It's not very often that such situations come up, though.  When passing
GC-allocated data to C code, it's generally a good idea to keep a
reference to it inside D code so that the GC can find the reference
anyway.  Since D doesn't have a moving GC, this is really all you need
to do.  Again, unless you're doing something unusual, you probably don't
need to touch the NO_MOVE flag.


> APPENDABLE - For D internal runtime use.  Don't mark this yourself.

Yes.


> NO_INTERIOR - This says that only the base address of the block may be
> a target address of other GC allocated pointers.  All other possible
> pointers are 'false' pointers.

Correct.  This flag might be useful if you know that your program will
only ever point to the head of the block, and you wish to optimize GC
performance by letting it skip over values that look like pointers, but
aren't (because they point to the interior of a NO_INTERIOR block, so
the GC knows it can't be a real pointer value).  But I'd recommend not
bothering with this flag unless you have a GC performance issue, and
you're sure that this specific situation is the cause of said
performance issue.  Otherwise you're just wasting time playing with GC
flags that don't really make a significant difference.


[...]
>               Question 3: How is this different that NO_SCAN.

NO_SCAN means don't look for pointer values inside the memory block.

NO_INTERIOR means ignore pointers (outside this block) that point to the
interior of this block.


> Perhaps I am missing the fundamentals of various D garbage collectors.
[...]

It's really very simple.  D uses a mark-and-sweep collector. That means
that at the start of every GC collection cycle, it starts with a set of
roots: pointers that represent active references to memory, such as CPU
registers, pointers on the runtime stack, global variables that contain
pointers, etc..  Then it recursively walks these pointers, and every
GC-allocated object that it reaches will be marked as live. The contents
of these objects are scanned for more pointers to other objects, etc..
At the end of the cycle, any object that isn't marked live is
unreachable from the program's roots, so it must be dead and can be
collected.

D's GC is conservative by default, meaning that it does not assume
anything about the structure of data in allocated blocks. Any
pointer-size aligned integer values that look like they are pointers,
will be treated as pointers.  (The docs officially discourage
deliberately storing pointers in integer variables, though, since this
would break the optional precise GC that *does* make certain assumptions
about where pointer values might be located inside allocated blocks.)
Of course, as I said earlier, the GC knows which memory ranges it
manages, so any pointer values that are outside of these ranges will be
ignored as irrelevant.

The various GC flags are simply hints that let you influence the
scanning process to some extent. The NO_SCAN bit means that upon
reaching this block, don't bother scanning its contents to find more
pointers (because there are none). The NO_INTERIOR bit means that if the
GC finds a pointer-like value that looks like it points to the inside of
this block, ignore it as a non-pointer, because pointers to this block
only ever point to its head (the supposed pointer is actually not a real
pointer, but an integer value that happens to have a pointer-like
value).

The other flags have very specific uses that, if you don't know what
they actually do, you probably don't need them and shouldn't touch them.


T

-- 
Bomb technician: If I'm running, try to keep up.


More information about the Digitalmars-d-learn mailing list