rt_finalize WTFs?

Sun Dec 4 19:07:14 PST 2011

On Mon, 05 Dec 2011 02:46:27 +0100, dsimcha <dsimcha at yahoo.com> wrote:

> I'm at my traditional passtime of trying to speed up D's garbage  
> collector again, and I've stumbled on the fact that rt_finalize is  
> taking up a ridiculous share of the time (~30% of total runtime) on a  
> benchmark where huge numbers of classes **that don't have destructors**  
> are being created and collected.  Here's the code to this function, from  
> lifetime.d:
>
> extern (C) void rt_finalize(void* p, bool det = true)
> {
>      debug(PRINTF) printf("rt_finalize(p = %p)\n", p);
>
>      if (p) // not necessary if called from gc
>      {
>          ClassInfo** pc = cast(ClassInfo**)p;
>
>          if (*pc)
>          {
>              ClassInfo c = **pc;
>              byte[]    w = c.init;
>
>              try
>              {
>                  if (det || collectHandler is null ||  
> collectHandler(cast(Object)p))
>                  {
>                      do
>                      {
>                          if (c.destructor)
>                          {
>                              fp_t fp = cast(fp_t)c.destructor;
>                              (*fp)(cast(Object)p); // call destructor
>                          }
>                          c = c.base;
>                      } while (c);
>                  }
>                  if ((cast(void**)p)[1]) // if monitor is not null
>                      _d_monitordelete(cast(Object)p, det);
>                  (cast(byte*) p)[0 .. w.length] = w[];  // WTF?
>              }
>              catch (Throwable e)
>              {
>                  onFinalizeError(**pc, e);
>              }
>              finally  // WTF?
>              {
>                  *pc = null; // zero vptr
>              }
>          }
>      }
> }
>
> Getting rid of the stuff I've marked with //WTF? comments (namely the  
> finally block and the re-initializing of the memory occupied by the  
> finalized object) speeds things up by ~15% on the benchmark in question.  
>   Why do we care what state the blob of memory is left in after we  
> finalize it?  I can kind of see that we want to clear things if  
> delete/clear was called manually and we want to leave the object in a  
> state that doesn't look valid.  However, this has significant  
> performance costs and IIRC is already done in clear() and delete is  
> supposed to be deprecated.  Furthermore, I'd like to get rid of the  
> finally block entirely, since I assume its presence and the effect on  
> the generated code is causing the slowdown, not the body, which just  
> assigns a pointer.
>
> Is there any good reason to keep this code around?

Not for the try block. With errors being not recoverable you don't need to  
care
about zeroing the vtbl or you could just copy the code into the catch  
handler.
This seems to cause less spilled variables.

Most expensive is the call to a memcpy at PLT, replace it with something  
inlineable.
Zeroing is not much faster than copying init[] for small classes.

At least zeroing should be worth it, unless the GC would not scan the  
memory otherwise.

gcbench/tree1 => 41.8s => https://gist.github.com/1432117 => gcbench/tree1  
=> 33.4s

Please add useful benchmarks to druntime.

martin