GC and dtors ~ a different approach?

kris foo at bar.com
Mon Apr 10 11:43:38 PDT 2006


Bruno Medeiros wrote:
> kris wrote:
> 
>> I thought it worthwhile to review the dtor behaviour and view the 
>> concerns from a different direction:
>>
>> dtor 'state' valid:
>> - explicit invocation via delete keyword
>> - explicit invocation via raii
>>
>> dtor state 'unspecified':
>> - implicitly called when no more references are held to the object
>> - implicitly called when a program terminates
>>
>>
>> Just for fun, let's assume the 'unspecified' issue cannot be resolved. 
>> Let's also assume there are dtors which expect to "clean up", and 
>> which will fail when the dtor state is 'unspecified'.
>>
>> What happens when a programmer forgets to explicitly delete such an 
>> object? Well, the program is highly likely to fail (or be in an 
>> inconsistent state) after the GC collects said object. This might be 
>> before or during program termination.
>>
>> How does one ensure this cannot occur? One obvious method would be for 
>> the GC to /not/ invoke any dtor by default. While the GC would still 
>> collect, such a change would ensure it cannot be the cause of a 
>> failing program (it would also make the GC a little faster, but that's 
>> probably beside the point).
>>
>> Assuming that were the case, we're left with only the two cases where 
>> cleanup is explicit and the dtor state is 'valid': via the delete 
>> keyword, and via raii (both of which apply the same functionality).
>>
>> This would tend to relieve the need for an explicit dispose() pattern, 
>> since the dtor is now the equivalent?
>>
>> What about implicit cleanup? In this scenario, it doesn't happen. If 
>> you don't explicitly (via delete or via raii) delete an object, the 
>> dtor is not invoked. This applies the notion that it's better to have 
>> a leak than a dead program. The leak is a bug to be resolved.
>>
>> What would be really nice is a tool to tell us about such leaks. It 
>> should be possible for the GC (when configured to do so) to identify 
>> collected objects which have a non-default dtor. In other words, the 
>> GC can probably tell if a custom dtor is present (it has a different 
>> address than a default dtor?). If the GC finds one of these during a 
>> normal collection cycle, and is about to collect it, it might raise a 
>> runtime error to indicate the leak instance?
>>
>> Anyway ~ to summarize, this would have the following effect:
>>
>> 1) no more bogus crashes due to dtors being invoked in an invalid state
>> 2) no need for the dispose() pattern
>> 3) normal collection does not invoke dtors, making it a little faster
>> 4) there's a possibility of a tool to identify and capture leaking 
>> resources. Something which would be handy anyway.
>>
>>
>> For the sake of example: "unscoped" resources, such as 
>> connection-pools, would operate per normal in this scenario: the pool 
>> elements should be deleted explicitly by the hosting pool (or be 
>> treated as leaks, if they have a custom dtor). The pool itself would 
>> have to be deleted explicitly also ~ as is currently the case today ~ 
>> which can optionally be handled via a module-dtor.
>>
>> Thoughts?
> 
> 
> All of those pros you mention are valid. But you'd have one serious con:
> * Any class which required cleanup would have to be manually memory 
> managed.
> 

Thanks;

First, let's change the verbiage of "valid" and "unspecified" to be 
"deterministic" and "non-deterministic" respectively (per Don C).

This makes it clear that a dtor invoked /lazily/ by the GC will be 
invoked in a non-deterministic state (how the GC works today). This 
non-deterministic state means that it's very likely any or all 
gc-managed references held purely by a class instance will already be 
collected when the relevant dtor is invoked.

The other aspect to consider is the timeliness of cleanup. Mike suggests 
that classes that actually have something to cleanup should do so in a 
timely manner, and that the indicator for this is the presence of a dtor.

To get to your assertion: under the suggested model, any class with 
resources that need to be released should either be 'delete'd at some 
appropriate point, or have raii applied to it. Classes with dtors that 
are not cleaned up in this manner can be treated as "leaks" (and can be 
identified at runtime).

Thus, the term "manually memory managed" is not as clear as it might be: 
raii can be used to clean up, and scope(exit) can be used to cleanup. An 
explicit 'delete' can be used to cleanup. There's no malloc() or 
anything like that invoved.

The truly serious problem with a 'lazy' cleanup is that the dtor will 
wind up invoked with non-determinstic state (typically leading to a 
serious error). The other concern with lazy cleanup is what Mike 
addresses (if the resource needs cleaning up, it should be done in a 
timely manner ~ not at some arbitrary point in the future).

What would be an example of a class requiring cleanup, which should be 
performed lazily? I can't think of a reasonable one off-hand, but let's 
take an example anyway:

Suppose I have a class that holds a file-handle. This handle should be 
released when the class is no longer in use. Luckily, the file-handle 
does not require to be GC-managed itself (can be held by the class as an 
integer). This provides us with two choices ~ release the handle in a 
timely fashion, or release it at some undetermined point in the future 
(when the class is collected). We're lucky to have a choice here; it's 
actually something of a special case.

The model suggested follows Mike's proposal that the file-handle should 
actually be released as soon as reasonably possible. RAII can be used to 
ensure that happens automagically. What happens if said class is not 
raii, and it not hit with a 'delete'? The suggested model can easily 
identify that class instance as a "leak" when collected by the GC, and 
report it as such. That is: instead of the GC-collector invoking the 
dtor with a non-deterministic state, it instead identifies a leaking 
resource.

As far as automatic cleanup goes, I think D is already well armed via 
raii and the scope() idiom. Adopting an attitude of cleaning up 
resources in a timely manner will surely only be of benefit in the long 
run?

Another approach here is to allow the collector to invoke the dtor (as 
it does today), and somehow ensure that its state is fully deterministic 
(which is not done today). I suspect that would be notably more 
expensive and/or difficult to achieve? However, that also does not 
address Mike's concern about timely cleanup, which I think is of valid 
concern. Thus, I really like the simplicity of the model as described 
above. It also has the added bonus of eliminating the need for a 
redundant dispose() pattern, and makes the GC a little faster :-)

- Kris



More information about the Digitalmars-d mailing list