draft proposal for ref counting in D
Walter Bright
newshound2 at digitalmars.com
Wed Oct 9 18:39:28 PDT 2013
On 6/25/2013 6:09 PM, Michel Fortin wrote:
> ## Some general comments
>
> While its a start, this is hardly enough for Objective-C. Mostly for legacy
reasons, most Objective-C methods return autoreleased objects (deferred release
using an autorelease pool) based on a naming convention. Also, Objective-C
objects can't be allocated from the D heap, so to avoid cycles we need weak
pointers. More on Objective-C later.
>
> While it's good that a direct call to AddRef/Release is forbidden in @safe
code, I think it should be forbidden in @system code too. The reason is that if
the compiler is inserting calls to these automatically and you're also adding
your own explicitly in the same function, it becomes practically impossible to
reason about the reference counts, short of looking at the assembly. Instead, I
think you should create a @noarc attribute for functions: it'll prevent the
compiler for inserting any of those calls so it becomes the responsibility of
the author to make those calls (which are then allowed). @noarc would be
incompatible with @safe, obviously.
It's a good point, but adding such an attribute to the function may be too
coarse, for one, and may cause composition problems, for another. Maybe just
disallowing it altogether is the best solution.
>
> Finally, that's a nitpick but I wish you'd use function names that fit D
better, such as opRetain and opRelease. Then you can add a "final void
opRetain() { AddRef(); }" function to the IUnknown COM interface and we could do
the same for Objective-C.
Makes sense.
>
> ## Objective-C autoreleased objects
>
> Objective-C is a special case. In Objective-C we need to know whether the
returned object of a function is already retained or if it is deferred released
(autoreleased). This is easily deducted from the naming convention.
Occasionally, we might need to create autorelease pools too, but that can
probably stay @system.
>
> (Note: all this idea of autoreleased objects might sound silly, but it was a
great help before ARC, and Objective-C ARC has to be compatible with legacy code
so it conforms to those conventions.)
>
> You can easily implement ARC for COM using an implementation of ARC for
Objective-C, the reverse is not true because COM does not have this (old but
still needed) concept of autorelease pools and deferred release where you need
to know at each function boundary whether returned values (including those
returned by pointer arguments) whether the object is expected to be retained or not.
>
> If I were you Walter, I would just not care about Objective-C idioms while
implementing this feature at first. It'll have to be special cased anyway.
Here's how I expect that'll be done:
From reading over that clang document, O-C arc is far more complex than I'd
anticipated. I think it is way beyond what we'd want in regular D. It also comes
with all kinds of pointer and function annotations - something I strongly want
to avoid.
>
> What will need to be done later when adding Objective-C support is to add an
internal "autoreleasedReturn" flag to a function that'll make codegen call
"autorelease" in the callee when returning an object and "retain" in the caller
where it receives an object from a function with that flag. Also, the same flag
and behaviour is needed for out parameters (to mimick those cases where an
object is returned by pointer). That flag will then be set automatically
internally depending on the function name (only for Objective-C member
functions), and it should be possible to override it explicitly with an
attribute or a pragma of some sort. This is what Clang is doing, and we must
match that to allow things to work.
I agree that this complexity should only be in O-C code.
>
> Checking for null is redundant in the Objective-C case: that check is done by
the runtime. That's of minor importance, but it might impact performance and
should probably special-cased in this case.
>
> ## Optimizations
>
> With Apple's implementation of reference counting (using global hash tables
protected by spin locks), it is more efficient to update many counters in one
operation. The codegen for Objective-C ARC upon assignement to a variable calls
"objc_storeStrong(id *object, id value)", incrementing and decrementing the two
counters presumably in one operation (as well as replacing the content of the
variable pointed by the first argument with the new value).
>
> Ideally, the codegen for Objective-C ARC in D would call the same functions
so we have the same performance. This means that codegen should make a call
"objc_retain" when first initializing a variable, "objc_storeStrong" when doing
an assignment, and "objc_release" when destructing a variable.
>
> As for returning autoreleased objects, there are two functions to choose from
depending on whether the object needs to be retained at the same time nor not.
(In general, the object needs to be retained prior autoreleasing if it comes
from a variable not part of the function's stack frame.)
>
> Here's Clang's documentation for how it implements ARC:
> http://clang.llvm.org/docs/AutomaticReferenceCounting.html
>
> ## Objective-C weak pointers
>
> Weak pointers are essential in order to break retain cycles in Objective-C
where there is no GC. They are implemented with the same kind of function calls
as strong pointers. Unfortunately, Apple's Objective-C implementation won't sit
well with D the way it works now.
>
> Weak pointers are implemented in Objective-C by registering the address of
the pointer with the runtime. This means that when a pointer is moved from one
location to another, the need to be notified of that through a call to
objc_moveWeak. This breaks one assumption of D that you can move memory at will
without calling anything.
>
> While we could still implement a working weak pointer with a template struct,
that struct would have to allocate a pointer on the heap (where it is guarantied
to not move) so it can store the true weak pointer recognized by the runtime.
I'm not sure that would be acceptable, but at least it would work.
>
> ## More on reference counting
>
> I feel like I should share some of my thoughts here about a broader use of
reference counting in D.
>
> First, we don't have to assume the reference counter has to be part of the
object. Apple implements reference counting using global hash tables where the
key is the address. It works very well.
>
> If we added a hash table like this for all memory allocated from the GC, we'd
just have to find the base address of any memory block to get to its reference
counter. I know you were designing with only classes in mind, but I want to
point out that it is possible to reference-count everything the GC allocates if
we want to.
D would need manual, RC and GC to coexist peacefully.
>
> The downside is that every assignment to a pointer anywhere has to call a
function. While this is some overhead, it is more predictable than overhead from
a GC scan and would be preferred in some situation (games I guess). Another
downside is you have an object retained by being present on the stack frame of a
C function, it'd have to be explicitly retained from elsewhere.
Doesn't this make it impractical to mix vanilla C with D code? An important
feature of D is this capability, without worrying about a "JNI" style interface.
As for D switching to a full refcounted GC for everything, I'm very hesitant for
such a step. For one thing, reading the clang spec on all the various pointer
and function annotations necessary is very off-putting.
>
> As for pointers not pointing to GC memory, the generic addRef/release
functions can ignore those pointers just like the GC ignores them today when it
does its scan.
>
> Finally, cycles can still be reclaimed by having the GC scan for them. Those
scans should be less frequent however since most of the memory can be reclaimed
through reference counting.
>
>
>
More information about the Digitalmars-d
mailing list