[D-runtime] Precise garbage collection

Fri Aug 9 01:47:33 PDT 2013

On 08.08.2013 16:06, Leandro Lucarella wrote:
> On Wed, Aug 07, 2013 at 10:42:25PM +0200, Rainer Schuetze wrote:
>> It currently even fails to compile the druntime unittests because of
>> the reported problems with the RTInfo generation. May it is better
>> to let it fail in public ;-)
>
> :O
> I didn't even try to compile it to be honest, just looked at the code,
> are you saying is not really working as it is now? I'm sorry, I'm not
> familiar with all the RTInfo stuff.

Unfortunately, for non-trivial programs it does not work out-of-the box, 
because you easily get link errors, mostly with respect to associative 
arrays. You can usually workaround these by adding aliases to the 
corresponding AssociativeArray(Key,Value) template somewhere. I recently 
managed to reduce it to something manageable:

http://d.puremagic.com/issues/show_bug.cgi?id=10786

The worse issue is silent non-generation of the RTInfo pointer, though 
the compiler replaces it with 0/1 that can be used for conservative 
scanning.

http://d.puremagic.com/issues/show_bug.cgi?id=10442

>
>> I also wanted to implement a different version for comparison, that
>> stores the TypeInfo pointer in the allocation. I have a first
>> attempt but it is quite a bit slower as it has to do some dynamic
>> casting during scanning because of the way arrays are implemented.
>> Some optimizations are needed to make an unbiased comparison, but
>> other stuff distracted from doing this.
>
> The old precise scanning patch I used with the concurrent collector did
> that and the overhead was considerable:
> http://www.llucax.com.ar/blog/blog/post/098ceae8
> (and http://www.llucax.com.ar/blog/blog/post/1490c03e also analyze the
> performance but is a little confusing, I'm not sure what happened there
> because what I said in the text is not really reflected in the graphs
> :P).

Thanks for the links, I think I already saw them long ago but reread 
them now. I'm not sure we can blame the precise scanning for increasing 
the size of an allocation dramatically by adding one pointer, causing 
the allocations to use the next bin of twice the size. This only happens 
for allocations that happen to match the bin sizes (or are designed with 
these GC internals in mind), for others the overhead for adding the 
pointer is zero.

In general we are already wasting 25% on average with these bin sizes 
that are a power of 2, we could mitigate the problem by adding some 
intermediate bin sizes.

One test I've run to test performance of the GC is testgc3 from the test 
suite, which just adds entries to a uint[uint] associative array.
A node in the hash list is 16 bytes for 32-bit processes, but 36 bytes 
due to pessimistic alignment assumptions for 64-bit processes. This 
causes 64 byte allocations by the GC with horrible results for both 
memory and performance for the 64-bit version in comparison with the 
32-bit version.