A GC Memory Usage Experiment: `--DRT-gcopt=heapSizeFactor` is Magic
FeepingCreature
feepingcreature at gmail.com
Fri Dec 9 10:18:07 UTC 2022
We have a process that balloons up to 5GB in production. That's a
bit much, so I started looking into ways to reign it in.
tl;dr: Add `--DRT-gcopt=heapSizeFactor:0.25` if you don't care
about CPU use, but want to keep RAM usage low.
# Setup
This is a process that heavily uses `std.json` decoded into an
object hierarchy in a multithreaded setup to load data from a
central store.
I let it run until it had loaded all its data, then recorded the
RES value from top.
Each configuration was run three times and averaged. Note that
actual numbers are from memory because I lost the data in a freak
LibreOffice crash (first time I've had Restore Document outright
_fail_ on me), but I don't think they're more than ±100MB off.
`heaptrack-d` (thanks Tim Schendekehl, your heaptrack branch
keeps on being useful) says that the used memory is about 1:2
split between internal allocations and std.json detritus. It also
shows the memory usage sawtoothing up and down by ~2.5, several
times during startup, as expected for a heavily GC-reliant
process.
# Premise
I have a standing theory (the "zombie stack hypothesis") that the
D GC can leak memory by dead references that are falsely kept
alive because they're not properly cleared from the stack by
successive calls. Ie.
```
void main() {
void foo() {
Object obj = new Object;
}
foo();
// foo returns, obj is still "live" because it's right above
the main stackframe
void bar() {
// somehow a gap arises in bar's stackframe?
void* ptr = void;
Thread.sleep(600.seconds);
}
bar;
}
```
Now `obj` is dead but will live for at least 10 minutes, because
its pointer value will never be erased.
It is unclear how much this actually happens. However, I was
trying the `-gx` flag to ostensibly suppress this effect.
All builds targeted x86 64-bit, DMD 2.100.2, LDC2 1.30.0
# Results
- DMD stock: 3.8GB
- DMD `-gx`: 3.4GB
- LDC stock: 3.1GB
- LDC `--DRT-gcopt=heapSizeFactor:0.25`: 800MB!!
- LDC with `"--DRT-gcopt=heapSizeFactor:0.25 gc:precise"`: 800MB
# Analysis
DMD stock loses by a massive margin, even compared to LDC stock.
It's unclear what is going on there: we may hypothesize that LDC
can maybe make denser use of the stack than DMD (?), which would
explain its superiority, but that hypothesis would predict that
DMD `-gx` would be *equal* to LDC. However, even with `-gx`, LDC
(without `-gx`!) still beats DMD by a good margin.
It's important to note that these values have significant noise.
Due to the nature of sparse GC runs, the result may be sensitive
to where exactly in the sawtooth pattern the benchmark ran out.
However, as we averaged over multiple runs, LDC still seems to
have an advantage here that neither noise nor the zombie stack
hypothesis can fully explain.
Now to the big one: `heapSizeFactor` is **massive**. For some
reason, running GC vastly more often has a `>2x` benefit. This is
even though the default `heapSizeFactor` is 2, meaning at most we
should get down to `1.5GB`.
It is possible that *something* about simply running the GC more
often helps it clean up dead values more effectively. The zombie
stack hypothesis has *some* opinions on this: maybe if a thread
happens to be idle when the GC runs, its low stack size helps the
GC discover that references that would usually be seen as
fake-alive are really dead? Am I using this hypothesis for
everything because that's all I got? MAYBE!!
**Caveat**: The LDC run with `heapSizeFactor` was also 2x-3x
slower than without. This is okay in our case because the process
in question spends the great majority of its lifetime sitting at
~5% CPU anyways.
Interestingly, the precise GC provided no benefit. My
understanding is that precise garbage collection only considers
fields in heap allocated areas that are actually pointers, rather
than splitting allocations into "may have pointer/pointer-free."
If so, the reason this provided no advantage may be because we're
running on 64-bit: it's much less likely than on 32-bit that a
random non-pointer value would alias a pointer. If the zombie
stack hypothesis holds up, the major benefit would come from
precise stack scanning, rather than precise heap scanning:
because memory is cleared on allocation by default, undead values
on the heap are inherently much less likely. However, precise
stack scanning is not implemented in any D compiler.
(Zombie heap leaks would arise only when a data structure like an
array or hashmap is downsized in place without clearing the
now-free fields.)
There is an open question what the tradeoff is for different
values of `heapSizeFactor`. Ostensibly, no values smaller than 1
should make any difference (being approximately "run the GC on
every allocation"), but this doesn't seem to be how it works.
There is some internal smoothing for the actual target value at
work, as well.
In any case, we will keep `--DRT-gcopt=heapSizeFactor` in mind as
our front-line response for processes using excessive amounts of
memory.
More information about the Digitalmars-d
mailing list