General performance tip about possibly using the GC or not

Mon Aug 28 18:28:18 PDT 2017

On Tuesday, August 29, 2017 00:52:11 Cecil Ward via Digitalmars-d-learn 
wrote:
> I am vacillating - considering breaking a lifetime's C habits and
> letting the D garbage collector make life wonderful by just
> cleaning up after me and ruining my future C disciple by not
> deleting stuff myself.
>
> I don't know when the GC actually gets a chance to run.

Normally, it's only run when you call new. When you call new, if it thinks
that it needs to do a collection to free up some space, then it will.
Otherwise, it won't normally ever run, because it's not sitting in its own
thread like happens with Java or C#. However, if you need it to run at a
particular time, you can call core.memory.GC.collect to explicitly tell it
to run a collection. Similarly, you can call GC.disable to make it so that a
section of code won't cause any collections (e.g. in a performance critical
loop that can't afford for the GC to kick in), and then you can call
GC.enable to turn it back on again.

> I am wondering if deleting the usual bothersome
> immediately-executed hand-written cleanup code could actually
> improve performance in a sense in some situations. If the cleanup
> is done later by the GC, then this might be done when the
> processor would otherwise be waiting for io, in the top loop of
> an app, say? And if so this would amount to moving the code to be
> run effectively like 'low priority' app-scheduled activities,
> when the process would be waiting anyway, so moving cpu cycles to
> a later time when it doesn't matter. Is this a reasonable picture?
>
> If I carry on deleting objects / freeing / cleaning up as I'm
> used to, without disabling the GC, am I just slowing my code
> down? Plus (for all I know) the GC will use at least some battery
> or possibly actually important cpu cycles in scanning and finding
> nothing to do all the time because I've fully cleaned up.
>
> I suppose there might also be a difference in cache-friendliness
> as cleaning up immediately by hand might be working on hot
> memory, but the GC scanner coming along much later might have to
> deal with cold memory, but it may not matter if the activity is
> app-scheduled like low priority work or is within time periods
> that are merely eating into io-bound wait periods anyway.
>
> I definitely need to read up on this. Have never used a GC
> language, just decades of C and mountains of asm.

For a lot of stuff, GCs will actually be faster. It really depends on what
your code is doing. One aspect of this is that when you're doing manual
memory management or reference counting, you're basically spreading out the
collection across the program. It's costing you all over the place but isn't
necessarily costing a lot in any particular place. The GC on the other hand
avoids a lot of that cost as you're running, because your program isn't
constantly doing all of that work to free stuff up - but when the GC does
kick in to do a collection, then it costs a lot more for that moment than
any particular freeing of memory would have cost with manual memory
management. It's doing all of that work at once rather than spreading it
out. Whether that results in a more performant program or a less performant
program depends a lot on what you're doing and what your use case can
tolerate. For most programs, having the GC stop stuff temporarily really
doesn't matter at all, whereas for something like a real-time program, it
would be fatal. So, it really depends on what you're doing.

Ultimately, for most programs, it makes the most sense to just use the GC
and optimize your program where it turns out to be necessary. That could
mean disabling the GC in certain sections of code, or it could mean managing
certain memory manually, because it's more efficient to do so in that case.
Doing stuff like allocating a lot of small objects and throwing them away
will definitely be a performance problem for the GC, but it's not all that
great for manual memory management either. A lot of the performance gains
come from doing stuff on the stack where possible, which is one area where
ranges tend to shine.

Another thing to consider is that some programs will need to have specific
threads not managed by the GC so that they can't be stopped during a
collection (e.g. a program with an audio pipeline will probably not want
that on a thread that's GC-managed), and that's one way to avoid a
performance hit from the GC. That's a fairly atypical need though, much as
it's critical for certain types of programs.

All in all, switching to using the GC primarily will probably take a bit of
a shift in thinking, but typical D idioms do tend to reduce the need for
memory management in general and reduce the negative impacts that can come
with garbage collection. And ultimately, some workloads will be more
efficient with the GC. It's my understanding that relatively few programs
end up needing to play games where they do things like disable the GC
temporarily, but the tools are there if you need them. And profiling should
help show you where bottlenecks are.

Ultimately, I think that using the GC is a lot better in most cases. It's
memory safe in a way that manual memory managemen can't be, and it frees you
up from a lot of tedious stuff that often comes with manual memory
management. But it's not a panacea either, and the fact that D provides ways
to work around it when it does become a problem is a real boon.

> Any general guidance on how to optimise cpu usage particularly
> responsiveness.
>
> One pattern I used to use when writing service processes (server
> apps) is that of deferring compute tasks by using a kind of 'post
> this action' which adds an entry into a queue, the entry is a
> function address plus arg list and represents work to be done
> later. In the top loop, the app then executes these 'posted' jobs
> later at app-scheduled low priority relative to other activities
> and all handling of io and timer events, when it has nothing else
> to do, by simply calling through the function pointer in a post
> queue entry. So it's a bit like setting a timer for 0 ms, passing
> a callback function. Terminology - A DFC or lazy, late execution
> might be other terms. I'm wondering if using the garbage
> collector well might fit into this familiar pattern? That fair?
> And actually even help peformance for me if I'm lucky?

I don't know. You'd probably have to try it and see. Predicting the
performance characteristics of programs is generally difficult, and most
programmers get it wrong a surprisingly large part of the time. That's part
of why profiling is so important, much as most of us tend to forget about it
until we run into a problem that absolutely requires it.

A big part of the question of whether the GC helps performance has to do
with how much garbage you're producing and how quickly you churn through it.
Allocating a bunch of stuff that you don't need to free for a while can
definitely work better with a GC, but if you're constantly allocating and
deallocating, then you can run into serious problems with both the GC and
manual memory management, and which is worse is going to depend on a number
of factors.

- Jonathan M Davis