More embarrassing microbenchmars for D's GC.

Tue Jun 10 11:19:09 PDT 2008

== Quote from Leandro Lucarella (llucax at gmail.com)'s article
> --TB36FDmn/VVEgNH/
> Content-Type: text/plain; charset=utf-8
> Content-Disposition: inline
> Content-Transfer-Encoding: 8bit
> Sean Kelly, el  9 de junio a las 22:53 me escribiste:
> > == Quote from Leandro Lucarella (llucax at gmail.com)'s article
> > > But there are a few other results I can't explain:
> > > 1) Why is D gc (disabled or not) version ~25% slower than the D version
> > >    that uses malloc when iterating the list? It shouldn't be any GC
> > >    activity in that part. Could be some GC locallity issue that yields
> > >    more cache misses?
> >
> > I think it may have more to do with the allocation strategy in the GC.  It obtains
> > memory in chunks from the OS, and each chunk is typically a max of 8MB.  So
> > for a test like this the D GC will end up hitting the OS quite a few times asking
> > for more memory.  If I had to guess I'd say that malloc has a more efficient
> > strategy here.  If you're interested, try running the same test using Tango
> > with and without a call tol tango.core.Memory.GC.reserve() for the amount of
> > memory you expect the app to use before the loop.
> Really? I though it was more collecion-related. I don't know exactly how
> the D GC is implemented, but I thought that, as you say, it allocates a
> chunk of memory, with each new it gave you a piece of that memory, and
> when it ran out of memory, it runs a collection cycle. When that cycle
> can't recover unused memory, allocates more.
> Considering that the version with GC disabled is almost as fast as the
> malloc version, and that the time grows exponentially (graph attached)
> with the number of nodes, it looks like the problem is the GC
> collection cycles to me. Is the GC algorithm exponential?

Oh okay.  I was replying to the post that suggested that said the malloc
version was still much faster than D with the GC disabled.  But if they're
comparable then all the better.  This is what I would expect anyway.

As for the GC algorithm being exponential... all nodes of an AA contain
pointers, so they will all be scanned by the GC.  Given this and the chance
of false positives, I would expect performance to decrease in a superlinear
manner.  But an exponential slow-down still seems a bit extreme.

Sean