Adding Java and C++ to the MQTT benchmarks or: How I Learned to Stop Worrying and Love the Garbage Collector

Wed Jan 8 11:15:37 PST 2014

On Wed, Jan 08, 2014 at 11:35:19AM +0000, Atila Neves wrote:
> http://atilanevesoncode.wordpress.com/2014/01/08/adding-java-and-c-to-the-mqtt-benchmarks-or-how-i-learned-to-stop-worrying-and-love-the-garbage-collector/

I have to say, this is also my experience with C++ after I learnt D.
Writing C++ is just so painful, so time-consuming, and so not rewarding
for the amount of effort you put into it, that I just can't bring myself
to write C++ anymore when I have the choice. And manual memory
management is a big part of that time sink. Which is why I believe that
a lot of the GC-phobia among the C/C++ folk is misplaced.  I can
sympathise, though, because coming from a C/C++ background myself, I was
highly skeptical of GC'd languages, and didn't find it to be a
particularly appealing aspect of D when I first started learning it.

But as I learned D, I eventually got used to having the GC around, and
discovered that not only it reduced the number of memory bugs
dramatically, it also increased my productivity dramatically: I never
realized just how much time and effort it took to write code with manual
memory management: you constantly have to think about how exactly you're
going to be storing your objects, who it's going to get passed to, how
to decide who's responsible for freeing it, what's the best strategy for
deciding who allocates and who frees. These considerations permeate
every aspect of your code, because you need to know whether to
pass/return an object* to someone, and whether this pointer implies
transfer of ownership or not, since that determines who's responsible to
free it, etc.. Even with C++'s smart pointers, you still have to decide
which one to use, and what pitfalls are associated with them (beware of
cycles with refcounted pointers, passing auto_ptr to somebody might
invalidate it after they return, etc.). It's like income tax: on just
about every line of code you write, you have to pay the "memory
management tax" of extra mental overhead and time spent fixing pointer
bugs in order to not get the IRS (Invalid Reference Segfault :P)
knocking on your shell prompt.

Manual memory management is a LOT of effort, and to be quite honest,
unless you're writing an AAA 3D game engine, you don't *need* that last
5% performance improvement that manual memory management *might* gives
you. That is, if you get it right. Which most C/C++ coders don't.

Case in point: recently at work I had the dubious pleasure of
encountering some C code with a particularly pathological memory
mismanagement bug.  To give a bit of context: in the past, this part of
the code used to be completely manually-managed with malloc's and free's
everywhere. Just like most C code that implements business logic, it
worked well when the original people who wrote it maintained it. But
life happens, and people leave and new people come, so over time, the
code degenerated into a sad mess riddled with memory leaks and pointer
bugs everywhere. So the team lead finally put his foot down, and
replaced much of that old code with a ref-counted infrastructure. (This
being C, installing a GC was too much work; plus, GC-phobia is pretty
strong in these parts.) After all, ref-counting is the silver bullet to
cure manual memory management troubles, right? Well...

Fast-forward a couple o' years, and here I am, helping a coworker figure
out why the code was crashing. Long story short, we eventually found
that it was keeping a ref-counted container that contains two (or more)
ref-counted objects, each of which represented an async task spawned by
the parent process. The idea behind this code was to run multiple
computations on the same data, and we will use the results from whoever
finishes first. The remaining task(s) will simply be terminated. So
*somebody*, noting that we had a ref-counted system, decided to take
advantage of that fact by setting it up so that when a task finishes, it
will destroy the sub-object it's associated with, and the dtor of this
object (which will be automatically invoked by the ref-counting system)
will then walk the container and destruct every other object, which in
turn will terminate their associated tasks. Anybody spot the problem
yet? The reasoning (as far as I can reconstruct it, anyway), goes: "In
order for the dtor to destruct the remaining tasks, we just have to
decrement the refcount on the container object; since there should only
be 1 reference to it, this will cause it to dip to 0, and then the
container's dtor will take care of cleaning up all the other tasks. But
in order for the task, when it finishes, to trigger the dtor of its
associated sub-object, the refcount of the sub-object must be 1,
otherwise the dtor won't trigger and we'll get stuck. So either the
container's reference to the sub-object shouldn't be counted, or the
task's reference to the sub-object shouldn't be counted. ..." And it
just goes downhill from there.

So much for refcounting solving memory-management woes. I'm becoming
more and more convinced that most coders have no idea how to write
manual memory management code properly. Or ref-counted code, for that
matter. For all the time and effort it took to implement a ref-counting
system in *C*, no less, and the time and effort it took to fix all the
bugs associated with it, now somebody conveniently goes and subverts the
ref-counting system, and we wonder why the code isn't working? And this
isn't even performance-critical code; it's *business logic*, for crying
out loud.  Sighh...

When I code in D, I discover to my pleasant surprise how much extra time
I have (and how much more spare mental capacity I have) now that I don't
have to continuously think about memory management. Sure, some of the
resulting code may not be squeezing every last drop of juice from my
CPU, but 95% of the time, it doesn't even matter anyway, 'cos it's not
even the performance bottleneck. One of the symptoms of C/C++ coders
(myself included) is that we like to write code in a funny, cramped
style that we've convinced ourselves is "optimal code". This includes
insistence on micro-managing memory allocations. However, most of this
is premature optimization, which can be readily proved by running a
profiler on your program, upon which you discover that *none* of your
meticulously-coded fine-tuned memory management code and carefully
written (aka unreadable and unmaintable) loops is even anywhere *near*
the real performance bottleneck, which turns out to be a call to
printf() that you forgot to comment out. Or a strlen() whose necessity
was forced upon you because C/C++ is still suffering from that age-old
mistake of conflating arrays with pointers. (Honestly, the necessity of
using strlen() in inconvenient places easily overshadows 99% of the
meticulously-crafted optimizations you spent 40 hours to write.)

The amount of headache (and time better spent thinking about more
important things, like how to implement an O(n log n) algorithm in place
of the current O(n^2) algorithm that will singlehandedly make *all* of
your other premature optimizations moot) saved by having a GC is almost
priceless.  Unless you're writing an AAA 3D game engine. Which only 5%
of us coders have the dubious pleasure of working on. :-P

Hooray for GC's, I say.

T

-- 
Дерево держится корнями, а человек - друзьями.