rt_finalize WTFs?

Mon Dec 5 09:56:50 PST 2011

On Mon, 05 Dec 2011 09:14:00 +0100, Rainer Schuetze <r.sagitario at gmx.de>  
wrote:

> Last time I looked at it, the try/catch/finally block was rather  
> expensive because it always invokes the exception handler unwinding  
> mechanism, even if no exception occurs.
> Try moving the try/catch block out of the loops that call rt_finalize.  
> (maybe just remove it, onFinalizeError just rethrows, and it seems noone  
> is using the information that the exception is thrown from inside the  
> finalizer)
>
Just an unconditional jump into the finally body here, but
it still affects register assignment.
Install an exception handler at sweep scope would save quite some
Moving the exception handler to the sweep scope seems promising,
can save lots of register saves.

I appreciate the recursion during mark, wanted to do this myself
sometime ago but expected a little more gain.

Some more ideas:

  - Do a major refactoring of the GC code, making it less reluctant
    to changes. Adding sanity checks or unit tests would be great.
    This probably reveals some obfuscated performance issues.

  - Add more realistic GC benchmarks, just requires adding to
    druntime/test/gcbench using the new runbench. The tree1 mainly
    uses homogeneous classes, so this is very synthesized.

  - There is one binary search pool lookup for every scanned address in  
range.
    Should be a lot to gain here, but it's difficult. It needs a multilevel
    mixture of bitset/hashtab.

  - Reduce the GC roots range. I will have to work on this for
    shared library support anyhow.

martin

> On 05.12.2011 02:46, dsimcha wrote:
>> I'm at my traditional passtime of trying to speed up D's garbage
>> collector again, and I've stumbled on the fact that rt_finalize is
>> taking up a ridiculous share of the time (~30% of total runtime) on a
>> benchmark where huge numbers of classes **that don't have destructors**
>> are being created and collected. Here's the code to this function, from
>> lifetime.d:
>>
>> extern (C) void rt_finalize(void* p, bool det = true)
>> {
>> debug(PRINTF) printf("rt_finalize(p = %p)\n", p);
>>
>> if (p) // not necessary if called from gc
>> {
>> ClassInfo** pc = cast(ClassInfo**)p;
>>
>> if (*pc)
>> {
>> ClassInfo c = **pc;
>> byte[] w = c.init;
>>
>> try
>> {
>> if (det || collectHandler is null || collectHandler(cast(Object)p))
>> {
>> do
>> {
>> if (c.destructor)
>> {
>> fp_t fp = cast(fp_t)c.destructor;
>> (*fp)(cast(Object)p); // call destructor
>> }
>> c = c.base;
>> } while (c);
>> }
>> if ((cast(void**)p)[1]) // if monitor is not null
>> _d_monitordelete(cast(Object)p, det);
>> (cast(byte*) p)[0 .. w.length] = w[]; // WTF?
>> }
>> catch (Throwable e)
>> {
>> onFinalizeError(**pc, e);
>> }
>> finally // WTF?
>> {
>> *pc = null; // zero vptr
>> }
>> }
>> }
>> }
>>
>> Getting rid of the stuff I've marked with //WTF? comments (namely the
>> finally block and the re-initializing of the memory occupied by the
>> finalized object) speeds things up by ~15% on the benchmark in question.
>> Why do we care what state the blob of memory is left in after we
>> finalize it? I can kind of see that we want to clear things if
>> delete/clear was called manually and we want to leave the object in a
>> state that doesn't look valid. However, this has significant performance
>> costs and IIRC is already done in clear() and delete is supposed to be
>> deprecated. Furthermore, I'd like to get rid of the finally block
>> entirely, since I assume its presence and the effect on the generated
>> code is causing the slowdown, not the body, which just assigns a  
>> pointer.
>>
>> Is there any good reason to keep this code around?