Linus with some good observations on garbage collection

Timon Gehr timon.gehr at gmx.ch
Fri Apr 22 14:04:22 PDT 2011


Brad Roberts wrote:
> Also add to it that in many cases you're dealing with a threaded environment, >
so those refcounts have to be locked
> (either via mutexes, or more commonly just atomic) operations which are far more
expensive than non-atomic.  More so
> when there's actual contention for the refcounted resource.

That is only a problem if the reference count of that resource changes at a very
high frequency. The described problem also implies that the threads would not need
any form of synchronization for the data (otherwise the reference count certainly
would not be a bottleneck.)

I cannot, at the moment, think of a real-world example where this would not imply
bad design. Can you help me out?

Michael Stover wrote:
> I'd like to say "were proved" rather than "are believed", but I don't actually
know where to go for such evidence.
>  However, I do believe many scripting languages, such as python, eventually
ditched the reference counting technique for
> generational, and Java has very fast GC, so I am inclined to believe those
real-life solutions than Linus.
>
> Mike

Well, the GC may be fast when compared to other GCs, but it has to be designed to
run on general data whose reference structure can be arbitrary. Often, the
objects/references have a somewhat specialized structure that a smart programmer
can exploit, especially if the references point to private data. But that only
matters if performance is of prime interest, and the gains may be not very big.

But, as pointed out by Linus, the prime performance problem is _not_ the GC, but
the mindset that comes with it. Most programmers that "grew up" in a managed
environment tend to use very many "new" keywords all over their code, instead of
allocating large chunks of memory at once. (Java/C#/etc encourage you to do this.)
When they then try to write a C++ program, they do the same. The resulting memory
bugs are then blamed on the lack of a GC (you can have GC in C/C++, but most of
the people I have talked to do not know this.) They then happily change back to
Java, that has a "very fast GC".

The important thing to note here is that the work required to deallocate all these
many memory locations does not magically disappear, but it is delegated to the GC,
which will mostly do it faster and more reliable than a programmer which has to do
it manually. But the problem does not lie in the deallocations, it's in the
allocations.

Consider this analogy:

Scenario 1: Many people like candy. Those are wrapped in colorful little pieces of
paper. Every day, every person buys one piece of candy in the candy shop (new
Candy()!) and on the way back home they throw away the wrapping paper somewhere on
the street (reassign reference). Those are garbage. In the evening, some creepy
guy comes to search the whole street for those small pieces of paper. Would you
call that guy a garbage collector? He collects garbage, but in the real world
garbage collectors work more like this:

Scenario 2: Still, many people like candy. Every person buys a bag of candy in the
candy shop once a year (new Candy[365]). When all the candy is eaten, they put all
the garbage in one bag and put it to their front door (reassign reference to whole
array). A very handsome guy collects all those bags. He is very much more
efficient than the guy in example 1. (Arguably, memory usage is bigger in that
particular example, but in computer programs, the allocating process can reuse the
memory. The analogy breaks down here.)

Note that I am not saying that garbage collection is bad. If the references form a
very complicated structure, or if a reference to the containing object is not
necessarily kept (iE. array slicing), it can be very useful as well as faster than
manual memory management. Also, custom allocators can reduce the
lots-of-small-allocations problem, but they have some overhead too. Advanced GCs
may do this automatically, I don't know.

The reason Java is garbage collected is _not_ performance, but primarily
reliability. In big programming projects, it can be hard to know where a reference
to an object may be kept, if there are circular references etc, as programs keep
expanding without the programmers understanding the whole project in detail. GC
also removes bugs related to memory management. I think true and reliable OO is
not possible without a GC. The downside is, that many Java/... programmers don't
concern themselves much with the internals of memory allocations, and are _very_
proud of it. This is also the reason I think it is a bad idea to deprecate D's
'delete'.


-Timon


More information about the Digitalmars-d mailing list