Unittests pass, and then an invalid memory operation happens after?

Thu Mar 28 04:46:27 UTC 2024

On Thu, Mar 28, 2024 at 03:56:10AM +0000, Liam McGillivray via Digitalmars-d-learn wrote:
[...]
> I may be now starting to see why the use of a garbage collector is
> such a point of contention for D. Not being able to predict how the
> garbage collection process will happen seems like a major problem.

If you want it to be predictable, simply:

	import core.memory;
	GC.disable();
	... // insert your code here
	if (timeToCleanup()) {
		GC.collect();	// now you know exactly when this happens
	}

Of course, you'll have to know exactly how timeToCleanup() should decide
when it's time to collect.  Simple possibilities are once every N units
of time, once every N iterations of some main loop, etc..  Or use a
profiler to decide.

> > As mentioned, GCs do not work this way -- you do not need to worry
> > about cascading removal of anything.
> 
> Wanting to avoid the GC pauses that I hear about, I was trying to
> optimize object deletion so that the GC doesn't have to look for every
> object individually. It sounds like what I'm hearing is that I should
> just leave everything to the GC. While I can do this without really
> hurting the performance of my program (for now), I don't like this.

The whole point of a GC is that you leave everything up to it to clean
up.  If you want to manage your own memory, don't use the GC. D does not
force you to use it; you can import core.stdc.stdlib and use malloc/free
to your heart's content.

> I hope that solving the unpredictable destruction pattern is a
> priority for the developers of the language. This problem in my
> program wouldn't be happening if either *all* of the objects had their
> destructors called or *none* of them did.

Unpredictable order of collection is an inherent property of GCs. It's
not going away.  If you don't like it, use malloc/free instead. (Or
write your own memory management scheme.)

> Anyway, I suppose I'll have to experiment with either manually
> destroying every object at the end of every unittest, or just leaving
> more to the GC.  Maybe I'll make a separate `die` function for the
> units, if you think it's a good idea.
[...]

I think you're approaching this from a totally wrong angle. (Which I
sympathize with, having come from a C/C++ background myself.)  The whole
point of having a GC is that you *don't* worry about when an object is
collected.  You just allocate whatever you need, and let the GC worry
about cleaning up after you. The more you let the GC do its job, the
better it will be.

Now of course there are situations where you need deterministic
destruction, such as freeing up system resources as soon as they're no
longer needed (file descriptors, OS shared memory segments allocations,
etc.). For these you would manage the memory manually (e.g. with a
struct that implements reference counting or whatever is appropriate).

As far as performance is concerned, a GC actually has higher throughput
than manually freeing objects, because in a fragmented heap situation,
freeing objects immediately when they go out of use incurs a lot of
random access RAM roundtrip costs, whereas a GC that scans memory for
references can amortize some of this cost to a single period of time.

Now somebody coming from C/C++ would immediately cringe at the thought
that a major GC collection might strike at the least opportune time. For
that, I'd say:

(1) don't fret about it until it actually becomes a problem. I.e., your
program is slow and/or has bad response times, and the profiler is
pointing to GC collections as the cause. Then you optimize appropriately
with the usual practices for GC optimization: preallocate before your
main loop, avoid frequent allocations of small objects (prefer to use
structs rather than classes), reuse previous allocations instead of
allocating new memory when you know that an existing object is no longer
used.  In D, you can also selectively allocate certain troublesome
objects with malloc/free instead (mixing both types of allocations is
perfectly fine in D; we are not Java where you're forced to use the GC
no matter what).

(2) Use D's GC control mechanisms to exercise some control over when
collections happen. By default, collections ONLY ever get triggered if
you try to allocate something and the heap has run out of memory.  Ergo,
if you don't allocate anything, GC collections are guaranteed not to
happen.  Use GC.disable and GC.collect to control when collections
happen.  In one of my projects, I got a 40% performance boost by using
GC.disable and using my own schedule of GC.collect, because the profiler
revealed that collections were happening too frequently.  The exact
details how what to do will depend on your project, of course, but my
point is, there are plenty of tools at your disposal to exercise some
degree of control.

Or if (1) and (2) are not enough for your particular case, you can
always resort to the nuclear option: slap @nogc on main() and use
malloc/free to your heart's content. IME, however, this is very, very
rarely called for.  Generally a combination of (1) and (2) has been more
than adequate to fix my GC issues.

T

-- 
Computerese Irregular Verb Conjugation: I have preferences.  You have biases.  He/She has prejudices. -- Gene Wirchenko