GC, the simple solution

Sun Jun 4 20:06:24 PDT 2006

Frank Benoit wrote:
> Ok, i see now that my "best and simplest" solution is neiter of both :)
> But I don't want to throw the idea away so fast. Please lets discuss it
> a bit further.

I'm not throwing it away so fast, I've thought about it for several 
years <g>.

> Let's think about the scenario of having reference counting (RC) and
> mark&sweep (MS) in parallel. Like Daniel Keep said, it is used in
> Python. If Python uses this, the idea cannot be so crazy.

Maybe, maybe not. Python isn't known for its speed. Java uses gc, not rc.

>> 1) cyclical data structures won't get free'd
> 
> Combine RC and MS. So cycles are detected. It should be possible to
> avoid cycles at all, if you don't want to have any MS pause.

There are ways to avoid the RC cycle problem, but I don't know how 
expensive they are.

>> 2) every pointer copy requires an increment and a corresponding
>> decrement - including when simply passing a reference to a function
> This is true for a smart pointer class. But is it really true for a
> compiler supported RC?

Yes.

> If I call a func/method, I hold a valid reference to the object. While
> this reference is copied down the stack, these are only temporary
> reference. And all of them are release before the call returns to the
> caller. You can ignore them all for RC.

This fails if the passed reference 'escapes'. It can escape by being 
assigned to a global, being assigned to a class member, being inserted 
into some data structure that lives longer than the function, and being 
returned by the function.

> Only if the reference is stored,
> the ref counter has to be incremented. This is only possible if this is
> done by the compiler. No smart pointer can do this :)
> 
>> 3) in a multithreaded app, the incs and decs must be synchronized
> 
> Isn't a atomic inc/dec enough?

That requires synchronizing.

> (Don't know anything about the
> performance of the asm "lock" instruction)

It's slow enough to be noticable.

>> 4) exception handlers (finally's) must be inserted to handle all the
>> decs so there are no leaks. Contrary to assertions otherwise, there is
>> no such thing as "zero overhead exceptions."
> 
> Yes, this is additional overhead in code size and runtime. And there are
> additional post destructors needed, to set all member refs to null,
> which wasn't nulled by the dtor.
> 
>> 5) in order to support slicing and interior pointers, as well as
>> supporting reference counting on arbitrary allocations of non-object
>> data, a separate "wrapper" object must be allocated for each allocation
>> to be ref counted. This essentially doubles the number of allocations
>> needed.
> 
> The MS GC allready has a memory registry. Isn't there stored, where an
> object begins? Let the ref counter be in this registry. This should work
> with all types and slicing and object member references.

Yes, you can find out the beginning of an arbitrary pointer's memory 
block. But that isn't a cheap operation.

>> 6) The wrapper object will mean that all pointers will need to be
>> double-dereferenced to access the data.
> 
> => 5.)

Finding the start of the memory chunk will be several times more 
expensive than the double indirect.

>> 7) Fixing the compiler to hide all this stuff from the programmer will
>> make it difficult to interface cleanly with C.
> 
> hm, ok. Here is some more manual work necessary.
> If you give the last reference to a C lib, isn't this a problem for the
> MS GC also?

I don't know anything about the MS GC.

>> 8) Ref counting can fragment the heap thereby consuming more memory just
>> like the gc can, though the gc typically will consume more memory overall.
> 
> Isn't the actual GC implementation also fragmenting the heap?

Yes. malloc will fragment the heap, too. But only GC has the hope of 
doing compaction.

> But I don't have any idea to this issue.
> 
>> 9) Ref counting does not eliminated latency problems, it just reduces them.
> 
> Where do you mean is the latency with RC? The collecting work is done in
> the smallest pieces all the time. Only the allocation of new memory
> needs an unbound time to complete.

That's the same with gc. Only upon an allocation is the time unbound.

> But it does not need to stop all
> other threads. e.g. hardware interrupts don't need to be disabled.

GC doesn't require disabling hardware interrupts. It does not need to 
stop any threads that do not hold references to GC allocated data.

>> The proposed C++ shared_ptr<>, which implements ref counting, suffers
>> from all these faults. I haven't seen a heads up benchmark of
>> shared_ptr<> vs mark/sweep, but I wouldn't be surprised if shared_ptr<>
>> turned out to be a significant loser in terms of both performance and
>> memory consumption.
>>
>> That said, I'd like for D to optionally support some form of ref
>> counting, as rc is better for managing scarce resources like file handles.
> 
> As I said, I would be happy to be able to switch the whole runtime
> system to RC. If I would loose 50% performance this is acceptable for me
> if I get latency < 100us.

Latency cannot be guaranteed even with malloc().

I don't know what constraints you have on the system you're developing. 
But when I've written real time interrupt code, the ISRs were written to 
not make any OS calls or any allocations. All they'd do is just gather 
data and post it to a FIFO buffer. A non-realtime thread would then pull 
the data out of the buffer and process it. The system would work without 
dropping data as long as the average (not maximum) processing time per 
datum was less than the data acquisition rate.