Significant GC performance penalty

Fri Dec 14 11:27:46 PST 2012

On Friday, 14 December 2012 at 18:27:29 UTC, Rob T wrote:
> I created a D library wrapper for sqlite3 that uses a 
> dynamically constructed result list for returned records from a 
> SELECT statement. It works in a similar way to a C++ version 
> that I wrote a while back.
>
> The D code is D code, not a cloned up version of my earlier C++ 
> code, so it makes use of many of the features of D, and one of 
> them is the garbage collector.
>
> When running comparison tests between the C++ version and the D 
> version, both compiled using performance optimization flags, 
> the C++ version runs 3x faster than the D version which was 
> very unexpected. If anything I was hoping for a performance 
> boost out of D or at least the same performance levels.
>
> I remembered reading about people having performance problems 
> with the GC, so I tried a quick fix, which was to disable the 
> GC before the SELECT is run and re-enable afterwards. The 
> result of doing that was a 3x performance boost, making the DMD 
> compiled version run almost as fast as the C++ version. The DMD 
> compiled version is now only 2 seconds slower on my stress test 
> runs of a SELECT that returns 200,000+ records with 14 fields. 
> Not too bad! I may get identical performance if I compile using 
> gdc, but that will have to wait until it is updated to 2.061.
>
> Fixing this was a major relief since the code is expected to be 
> used in a commercial setting. I'm wondering though, why the GC 
> causes such a large penalty, and what negative effect if any if 
> there will be when disabling the GC temporarily. I know that 
> memory won't be reclaimed until the GC is re-enabled, but is 
> there anything else to worry about?
>
> I feel it's worth commenting on my experience as feed back for 
> the D developers and anyone else starting off with D.
>
> Coming from C++ I *really* did not like having the GC, it made 
> me very nervous, but now that I'm used to having it, I've come 
> to like having it up to a point. It really does change the way 
> you think and code. However as I've discovered, you still have 
> to always be thinking about memory management issues because 
> the GC can eat up a huge performance penalty under certain 
> situations. I also NEED to know that I can always go full 
> manual where necessary. There's no way I would want to give up 
> that kind of control.
>
> The trade off with having a GC seems to be that by default, C++ 
> apps will perform considerably faster than equivalent D apps 
> out-of-the-box, simply because the manual memory management is 
> fine tuned by the programmer as the development proceeds. With 
> D, when you simply let the GC take care of business, then you 
> are not necessarily fine tuning as you go along, and when you 
> do not take the resulting performance hit into consideration it 
> means that your apps will likely perform poorly compared to a 
> C++ equivalent. However, building the equivalent app in D is a 
> much more pleasant experience in terms of the programming 
> productivity gain. The code is simpler to deal with, and 
> there's less to worry about with pointers and other memory 
> management issues.
>
> What I have not yet had the opportunity to explore, is using D 
> in full manual memory management mode. My understanding is that 
> if I take that route, then I cannot use certain parts of the 
> std lib, and will also loose a few of the nice features of D 
> that make it fun to work with. I'm not fully clear though on 
> what to expect, so if there's any detailed information to look 
> at, it would be a big help.
>
> I wonder what can be done to allow a programmer to go fully 
> manual, while not loosing any of the nice features of D?
>
> Also, I think everyone agrees we really need a better GC, and I 
> wonder once we do get a better GC, what kind of overall 
> improvements we can expect to see?
>
> Thanks for listening.
>
> --rt

Having lots of experience in GC enabled languages, even for 
systems programming (Oberon & Active Oberon).

I think there a few issues to consider:

- D's GC still has a lot of room to improve, so some of the 
issues you have found might eventually get improved;

- Having GC support, does not mean to do call new like crazy, one 
still needs to think how to code in a GC friendly way;

- Make proper use of weak references in case they are available;

- GC enabled languages runtimes usually offer ways to peak into 
the runtime, somehow, and allow the developer to understand how 
GC is working and what might be improved;

The goodness of having a GC is to have a safer way to manage 
memory across multiple modules, specially when ownership is not 
clear.

Even in C++ I seldom do manual memory management nowadays, if 
working on new codebases. Of course, others will have a different 
experience.

Other than that, thanks for sharing your experience.

--
Paulo