General performance tip about possibly using the GC or not

Mike Parker via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Mon Aug 28 18:48:04 PDT 2017


On Tuesday, 29 August 2017 at 00:52:11 UTC, Cecil Ward wrote:
> I am vacillating - considering breaking a lifetime's C habits 
> and letting the D garbage collector make life wonderful by just 
> cleaning up after me and ruining my future C disciple by not 
> deleting stuff myself.

It's not a panacea, but it's also not the boogyeman some people 
make it out to be. You can let the GC do it's thing most of the 
time and not worry about it. For the times when you do need to 
worry about it, there are tools available to mitigate its impact.

>
> I don't know when the GC actually gets a chance to run.

Only when memory is allocated from the GC, such as when you 
allocate via new, or use a buit-in language feature that 
implicitly allocates (like array concatenation). And then, it 
only runs if it needs to.

>
> I am wondering if deleting the usual bothersome 
> immediately-executed hand-written cleanup code could actually 
> improve performance in a sense in some situations. If the 
> cleanup is done later by the GC, then this might be done when 
> the processor would otherwise be waiting for io, in the top 
> loop of an app, say? And if so this would amount to moving the 
> code to be run effectively like 'low priority' app-scheduled 
> activities, when the process would be waiting anyway, so moving 
> cpu cycles to a later time when it doesn't matter. Is this a 
> reasonable picture?

When programming to D's GC, some of the same allocation 
strategies you use in C still apply. For example, in C you 
generally wouldn't allocate multiple objects in critical loop 
because allocations are not cheap -- you'd preallocate them, 
possibly on the stack, before entering the loop. That same 
strategy is a win in D, but for a different reason -- if you 
don't allocate anything from the GC heap in the loop, then the GC 
won't run in the loop.

Multiple threads complicate the picture a bit. A background 
thread might trigger a GC collection when you don't want it to, 
but it's still possible to mitigate the impact. This is the sort 
of thing that isn't necessary to concern yourself with in the 
general case, but that you need to be aware of so you can 
recognize it when it happens.

An example that I found interesting was the one Funkwerk 
encountered when the GC was causing their server to drop 
connections [1].


>
> If I carry on deleting objects / freeing / cleaning up as I'm 
> used to, without disabling the GC, am I just slowing my code 
> down? Plus (for all I know) the GC will use at least some 
> battery or possibly actually important cpu cycles in scanning 
> and finding nothing to do all the time because I've fully 
> cleaned up.

You generally don't delete or free GC-allocated memory. You can 
call destroy on GC-allocated objects, but that just calls the 
destructor and doesn't trigger a collection. And whatever you do 
with the C heap isn't going to negatively impact GC performance. 
You can trigger a collection by calling GC.collect. That's a 
useful tool in certain circumstances, but it can also hurt 
performance by forcing collections when they aren't needed.

The two fundamental mitigation strategies that you can follow in 
the general case: 1.) minimize the number of allocations and 2.) 
keep the size of allocations as small as possible. The first 
decreases the number of opportunities for a collection to occur, 
the second helps keep collection times shorter. That doesn't mean 
you should always work to avoid the GC, just be smart about how 
and when you allocate just as you would in C and C++.

>
> I suppose there might also be a difference in 
> cache-friendliness as cleaning up immediately by hand might be 
> working on hot memory, but the GC scanner coming along much 
> later might have to deal with cold memory, but it may not 
> matter if the activity is app-scheduled like low priority work 
> or is within time periods that are merely eating into io-bound 
> wait periods anyway.
>
> I definitely need to read up on this. Have never used a GC 
> language, just decades of C and mountains of asm.

You might start with the GC series on the D Blog [2]. The next 
post (Go Your Own Way Part Two: The Heap) is coming some time in 
the next couple of weeks.

>
> Any general guidance on how to optimise cpu usage particularly 
> responsiveness.

If it works for C, it works for D. Yes, the GC can throw you into 
a world of cache misses, but again, smart allocation strategies 
can minimize the impact.

Having worked quite a bit with C, Java, and D, my sense is it's 
best to treat D more like C than Java. Java programmers have 
traditionally had little support for optimizing cache usage 
(there are libraries out there now that can help, and I hear 
there's movement to finally bring value type aggregates to the 
language), and with the modern GC implementations as good as they 
are it's recommended to avoid the strategies of the past (such as 
pooling and reusing objects) in favor of allocating as needed. In 
D, you have the tools to optimize cache usage (such as choosing 
contiguous arrays of efficiently laid out structs over 
GC-allocated classes), and the GC implementation isn't near as 
shiny as those available for Java. So I think it's more natural 
for a C programmer with little Java experience to write efficient 
code in D than the converse. Don't overthink it.

>
> One pattern I used to use when writing service processes 
> (server apps) is that of deferring compute tasks by using a 
> kind of 'post this action' which adds an entry into a queue, 
> the entry is a function address plus arg list and represents 
> work to be done later. In the top loop, the app then executes 
> these 'posted' jobs later at app-scheduled low priority 
> relative to other activities and all handling of io and timer 
> events, when it has nothing else to do, by simply calling 
> through the function pointer in a post queue entry. So it's a 
> bit like setting a timer for 0 ms, passing a callback function. 
> Terminology - A DFC or lazy, late execution might be other 
> terms. I'm wondering if using the garbage collector well might 
> fit into this familiar pattern? That fair? And actually even 
> help peformance for me if I'm lucky?

The GC will certainly simplify the implementation in that you can 
allocate your arg list and not worry about freeing it, but how it 
affects performance is anyone's guess. That largely depends on 
the points I raises above: how often you allocate and how much.


[1] https://dlang.org/blog/2017/07/28/project-highlight-funkwerk/
[2] https://dlang.org/blog/the-gc-series/




More information about the Digitalmars-d-learn mailing list