Cross-post from druntime: Mixing GC and non-GC in D. (AKA, "don't touch GC-references from DTOR, preferably don't use DTOR at all")

Ulrik Mikaelsson ulrik.mikaelsson at gmail.com
Wed Dec 15 13:23:24 PST 2010


Cross-posting after request on the druntime list:
------------------------------

Hi,

DISCLAIMER: I'm developing for D1/Tango. It is possible these issues
are already resolved for D2/druntime. If so, I've failed to find any
information about it, please do tell.

Recently, I've been trying to optimize my application by swapping out
some resource allocation (file-descriptors for one) to
reference-counted allocation instead of GC. I've hit some problems.

Problem
=======

Basically, the core of all my problems is something expressed in
http://bartoszmilewski.wordpress.com/2009/08/19/the-anatomy-of-reference-counting/
as "An object’s destructor must not access any garbage-collected
objects embedded in it.".

This becomes a real problem for various allocation-schemes, be it
hierarchic allocation, reference counting, or a bunch of other custom
resource-schemes. _It renders the destructor of mostly D useless for
anything but mere C-binding-cleanup._

Consequence
===========
For the Reference-Counted example, the only working solution is to
have the counted object malloced, instead of GC-allocated. One could
argue that "correct" programs with reference-counting should do the
memory management completely explicit anyways, and yes, it's largely
true. The struct-dtor of D2 makes the C++ "smartptr"-construct
possible, making refcount-use mostly natural and automatic anyways.

However, it also means, that the refcounted object itself, can never
use GC-allocated structures, such as mostly ANYTHING from the stdlib!
In effect, as soon as you leave the GC behind, you leave over half of
all useful things of D behind.

This is a real bummer. What first attracted me to D, and I believe is
still the one of the key strengths of D, is the possibilities of
hybrid GC/other memory-schemes. It allows the developer to write up
something quick-n-dirty, and then improve in the places where it's
actually needed, such as for open files, or gui-context-handles, or
other expensive/limited resources.

As another indication that is really is a problem: In Tango, this have
lead to the introduction of an additional destructor-type method
"dispose", which is doing AFAICT what the destructor should have done,
but is only invoked for deterministic destruction by "delete" or
scope-guards. IMO, that can only lead to a world of pain and
misunderstandings, having two different "destructors" ran depending on
WHY the object were destroyed.

Proposed Solution
=================
Back to the core problem "An object’s destructor must not access any
garbage-collected objects embedded in it.".

As far as I can tell (but I'm no GC expert), this is a direct effect
of the current implementation of the GC, more specifically the loop
starting at http://www.dsource.org/projects/druntime/browser/trunk/src/gc/gcx.d#L2492.
In this loop, all non-marked objects gets their finalizers run, and
immediately after, they get freed. If I understand the code right,
this freeing is what's actually causing the problems, namely that if
the order in the loop don't match the order of references in the freed
object (which it can't, especially for circular references), it might
destroy a memory-area before calling the next finalizer, attempting to
use the just freed reference.

Wouldn't it instead be possible to split this loop into two separate
loops, the first calling all finalizers, letting them run with all
objects still intact, and then afterwards run a second pass, actually
destroying the objects? AFAICT, this would resolve the entire problem,
allowing for much more mixed-use of allocation strategies as well as
making the destructor much more useful.

Alternate Solution
==================
On the druntime-list, Vladimir suggested something similar could be
achieved by simply creating a custom allocator which automatically
adds it's pool to the GC-root. This would solve my problems
satisfactory, and it is probably what I'm going to do for my immediate
problem.

I believe (again, no GC-expert) it may even have the advantage of
relieving some pressure from the GC, in terms of objects it  really
has to track.

However, it has the disadvantages that,
 * the GC can no longer be used as a "fallback/safety net", putting
extra burden of correctness on the programmer (perhaps a good thing?)
 * destructors on "regular" GC-objects still cannot touch related
objects. I.E. consider the following example. Yes, pretty bad design,
but it's non-obvious why it's invalid, and intuitively not expected to
cause segfaults.

class List {
  int count;
  class Entry {
    this { count++; }
    ~this { count--; }
  }
}

------------------------------

I strongly believe the language/runtime should not needlessly lay out
"non-obvious" traps like this for the developer. For a C++-convert it
is quite counterintuitive, and even if you know about it, it's tricky
to work-around.

I think both solutions have their merits, but short of serious
performance-issues with the first proposed solution, I think it's
preferable. I also think the second solution has some merit, and I
think it should be documented, and perhaps have some support (I.E.
other-type base-class or mixin) from the standard-libraries.

Ideas, opinions? Perhaps this have been discussed before?

Regards
/ Ulrik


More information about the Digitalmars-d mailing list