Significant GC performance penalty
Rob T
rob at ucora.com
Fri Dec 14 10:27:26 PST 2012
I created a D library wrapper for sqlite3 that uses a dynamically
constructed result list for returned records from a SELECT
statement. It works in a similar way to a C++ version that I
wrote a while back.
The D code is D code, not a cloned up version of my earlier C++
code, so it makes use of many of the features of D, and one of
them is the garbage collector.
When running comparison tests between the C++ version and the D
version, both compiled using performance optimization flags, the
C++ version runs 3x faster than the D version which was very
unexpected. If anything I was hoping for a performance boost out
of D or at least the same performance levels.
I remembered reading about people having performance problems
with the GC, so I tried a quick fix, which was to disable the GC
before the SELECT is run and re-enable afterwards. The result of
doing that was a 3x performance boost, making the DMD compiled
version run almost as fast as the C++ version. The DMD compiled
version is now only 2 seconds slower on my stress test runs of a
SELECT that returns 200,000+ records with 14 fields. Not too bad!
I may get identical performance if I compile using gdc, but that
will have to wait until it is updated to 2.061.
Fixing this was a major relief since the code is expected to be
used in a commercial setting. I'm wondering though, why the GC
causes such a large penalty, and what negative effect if any if
there will be when disabling the GC temporarily. I know that
memory won't be reclaimed until the GC is re-enabled, but is
there anything else to worry about?
I feel it's worth commenting on my experience as feed back for
the D developers and anyone else starting off with D.
Coming from C++ I *really* did not like having the GC, it made me
very nervous, but now that I'm used to having it, I've come to
like having it up to a point. It really does change the way you
think and code. However as I've discovered, you still have to
always be thinking about memory management issues because the GC
can eat up a huge performance penalty under certain situations. I
also NEED to know that I can always go full manual where
necessary. There's no way I would want to give up that kind of
control.
The trade off with having a GC seems to be that by default, C++
apps will perform considerably faster than equivalent D apps
out-of-the-box, simply because the manual memory management is
fine tuned by the programmer as the development proceeds. With D,
when you simply let the GC take care of business, then you are
not necessarily fine tuning as you go along, and when you do not
take the resulting performance hit into consideration it means
that your apps will likely perform poorly compared to a C++
equivalent. However, building the equivalent app in D is a much
more pleasant experience in terms of the programming productivity
gain. The code is simpler to deal with, and there's less to worry
about with pointers and other memory management issues.
What I have not yet had the opportunity to explore, is using D in
full manual memory management mode. My understanding is that if I
take that route, then I cannot use certain parts of the std lib,
and will also loose a few of the nice features of D that make it
fun to work with. I'm not fully clear though on what to expect,
so if there's any detailed information to look at, it would be a
big help.
I wonder what can be done to allow a programmer to go fully
manual, while not loosing any of the nice features of D?
Also, I think everyone agrees we really need a better GC, and I
wonder once we do get a better GC, what kind of overall
improvements we can expect to see?
Thanks for listening.
--rt
More information about the Digitalmars-d
mailing list