dsimcha at yahoo.com
Mon Dec 5 13:07:09 PST 2011
== Quote from Martin Nowak (dawg at dawgfoto.de)'s article
> I appreciate the recursion during mark, wanted to do this myself
> sometime ago but expected a little more gain.
The reason the gain wasn't huge is because on the benchmark I have that involves a
deep heap graph, sweeping time dominates marking time. The performance gain for
the mark phase only (which is important b/c this is when the world needs to be
stopped) is ~20-30%.
> Some more ideas:
> - Do a major refactoring of the GC code, making it less reluctant
> to changes. Adding sanity checks or unit tests would be great.
> This probably reveals some obfuscated performance issues.
Not just obfuscated ones. I've wanted to fix an obvious perf bug for two years
and haven't done it because the necessary refactoring would be unbelievably messy
and I'm too afraid I'll break something. Basically, malloc() sets the bytes
between the size you requested and the size of the block actually allocated to
zero to prevent false pointers. This is reasonable. The problem is that it does
so **while holding the GC's lock**. Fixing it for just the case when malloc() is
called by the user is also easy. The problem is fixing it when malloc() gets
called from realloc(), calloc(), etc.
> - Add more realistic GC benchmarks, just requires adding to
> druntime/test/gcbench using the new runbench. The tree1 mainly
> uses homogeneous classes, so this is very synthesized.
I'll crowdsource this. I can't think of any good benchmarks that are < a few
hundred lines w/ no dependencies but aren't pretty synthetic.
> - There is one binary search pool lookup for every scanned address in
> Should be a lot to gain here, but it's difficult. It needs a multilevel
> mixture of bitset/hashtab.
I understand the problem, but please elaborate on the proposed solution. You've
basically got a bunch of pools, each of which represents a range of memory
addresses, not a single address (so a basic hashtable is out). You need to know
which range some pointer fits in. How would you beat binary search/O(log N) for this?
> - Reduce the GC roots range. I will have to work on this for
> shared library support anyhow.
Please clarify what you mean by "reduce" the roots range.
Thanks for the feedback/suggestions.
More information about the Digitalmars-d