Smart pointers instead of GC?

Mon Feb 3 18:26:40 PST 2014

On Mon, 03 Feb 2014 18:14:36 -0800, Ola Fosheim Grøstad  
<ola.fosheim.grostad+dlang at gmail.com> wrote:

> On Tuesday, 4 February 2014 at 01:36:09 UTC, Adam Wilson wrote:
>> 1. RC imposes a time overhead on mutators in order to manipulate the  
>> counter.
>
> Usually you just do a retain() when your thread attain ownership (at the  
> root of the call tree). If you use embedded counters you most likely  
> want the data in the cache anyway, but the cacheline is made dirty. So  
> it cost you a write back. That affects reads of small objects more than  
> writes.
>
> (The C++ implementation does not use embedded counters.)
>
>> 2. Both the counter manipulation and pointer load/store operations MUST  
>> be atomic to prevent races.
>
> Not with transactional memory. You revert to locking, and you probably  
> want that kind of synchronization anyway if you run multi threaded.
>
>> 3. Naive RC turns read ops into store ops to update the count
>
> If we assume naive RC, then we should assume naive GC.
>

Indeed D is a naive GC, but that's not the point. Naive RC incurs a  
penalty that even Naive GC's don't.

>> 4. No RC can reclaim cyclic data structures, which are much more common  
>> than is typically understood. [Bacon and Rajan 2001]
>
> Weak pointers are sufficient to break cycles. You need a model, but you  
> should have one anyway.
>

Yes, they absolutely are. However, supporting them at the automatic level  
requires adding keywords to the language. If ARC is made the default in D  
you will automatically have memory leaks where before you did not, and  
programmer will have to carefully scan the code by hand to find everything  
that needs to be marked with a 'weak' keyword. It's an extremely  
frustrating and error-prone process. The compiler cannot detect  
cyclic-refs, it's a halting problem.

> In cases where you build temporary data structures you can just avoid RC  
> altogether and just use a pool, then free the entire pool.
>
>> 5. Counter must be the same size as the pointer, which can result in  
>> significant overhead for small objects.
>
> No? The counter must be able to hold the # of retain() calls.
>

Yes, which can theoretically be as many as a 64-bit integer. In a systems  
language you have to design for what is possible, not merely likely.  
Because someday, some nutcase, somewhere is going to overflow the counter,  
and given that it's hidden by the compiler, will be VERY tricky to figure  
out. And he will have a perfectly rational reason for doing so.

>> 6. RC can still pause. When the last head to a large pointer structure  
>> is deleted, RC MUST delete each descendant node.
>
> Not really, you can postpone the deletion by pushing it onto a queue.  
> The free it not by node at your own leisure...
>

The GC Handbook deals with that idea directly: "Unfortunately, this  
technique allows large garbage structures to be hidden by smaller ones,  
and hence increases overall space requirements. [Boehm, 2004]" - The  
Garbage Collection Handbook

>> Note that these are paraphrases of the book, not me talking. And these  
>> apply equally to ARC and vanilla RC.
>
> You don't  use RC alone, you use it with fully owned pointers, region  
> pools, etc.
>
> GC is simpler and more uniform, and that's it.

That may be "it", but it's a pretty big "it". Simplicity and uniformity  
are VERY important.

-- 
Adam Wilson
GitHub/IRC: LightBender
Aurora Project Coordinator