Improving DIP74: functions borrow by default, retain only if needed

Fri Mar 6 11:57:27 PST 2015

On Saturday, 28 February 2015 at 02:55:14 UTC, Michel Fortin 
wrote:
> On 2015-02-27 23:11:55 +0000, deadalnix said:
>
>> On Friday, 27 February 2015 at 23:06:26 UTC, Andrei 
>> Alexandrescu wrote:
>>> OK, so at least in theory autorelease pools are not necessary 
>>> for getting ARC to work? -- Andrei
>> 
>> ARC need them, this is part of the spec. You can have good RC 
>> without them IMO.
>
> Apple's ARC needs autorelease pools to interact with 
> Objective-C code. But if by ARC you just mean what the acronym 
> stands for -- automatic reference counting -- there's no need 
> for autorelease pools to implement ARC.

Hi, I'm the core language designer for ObjC ARC.  I was pointed at
this thread by a coworker.

ObjC ARC uses +0 conventions for parameters and results by
default only because those are the conventions used by manual
reference counting (MRC) in Objective-C.  Those conventions
developed that way because programmers were manually implementing
them.  ARC can only deviate from those conventions when it's 
certain
that it knows about all callers and implementations; since 
individual
files in a project can be selectively compiled in either ARC or 
MRC,
and since Objective-C methods can be dynamically overridden and
reflectively called, that would only be possible for static 
functions,
and even then we might need thunks when taking their address.

A +0 result convention does have some benefits when implemented
by programmers.  Assuming you don't care about safety in the
presence of data races, certain functions (getters, chiefly) can
simply return "borrowed" references which the caller can use 
without
retaining if they're very careful.  This is almost completely 
impossible
for a compiler to take effective advantage of, because programmers
can make much more aggressive/unsafe assumptions about the
behavior of code ("It's obvious that none of these calls can 
invalidate
my borrowed reference.").  And it creates the problem of what to 
do
when you have to follow a +0 convention but have a naturally +1
result.

There is absolutely no reason to emulate what ObjC ARC does with
autoreleased results.  It's bad for performance in about 
half-a-dozen
different ways, and the trick we use to avoid actual autoreleases 
is
extremely brittle.  If you really care about the borrowed result
optimization, you can use a dynamic convention, where you also
return a flag saying whether the result is borrowed; it then 
becomes
a neat optimization problem to actually take advantage of that.  
That
was never an option for ARC because it's not MRC compatible.  I
would be very concerned about the code-size impact of doing this 
for
arbitrary calls, but you could consider selectively using it for 
getters.

Parameters are a different story, and you can make a case either 
way.

On the caller side, the function got the argument reference
from somewhere, probably by constructing it or calling a function
that returned it.  Even if the reference was loaded from memory, 
it
may need to be retained for safety's sake if the memory is 
mutable.
So the caller generally owns a retain of the argument.  A +1
convention allows that reference to simply be forwarded without
extra work in the common case that it's used in exactly one place.
The disadvantage is that, if the reference is used multiple times,
it may need to be retained multiple times just to balance the
convention.

On the callee side, the language needs to guarantee that the 
object
stays valid as long as it's being used within the function.  In a 
+0
convention, you can have the caller make that guarantee; of 
course,
that means the caller will always have to retain unless it's able 
to
forward a similar guarantee from somewhere else.  Without this
guarantee, in a +0 convention the callee's probably going to need
to retain anyway.  (Unfortunately, in Objective-C the caller does 
not
make this guarantee.)

You can imagine situations where any one of these three 
conventions
(+1, +0 guaranteed, +0 non-guaranteed) is the most profitable.

I tend to prefer a +1 convention because of its impact on common,
straight-line code.

A +0 non-guaranteed convention is very nice for higher-order
algorithms on arrays because you can briefly borrow the reference
from the array and let the callee decide whether it needs to 
retain.
If the callee is some lightweight function like a sort comparator,
it probably doesn't need to.  But the convention is awful for more
complex code because it frequently forces both sides to own a
retain.

A +0 guaranteed convention avoids creating redundant work for
values used multiple times, but it does prevent a reference from
being "forwarded": if you allocate it in the caller, and then 
store
it somewhere in the callee, you're going to need a redundant
retain.  Consider using this for select arguments like the "this"
argument of a method.

All of this analysis assumes that you have some built-in
optimization of retain and release operations.

I probably won't watch this thread, but feel free to email me if
you have further questions.