GC.forget() (Was: O(N) Garbage collection?)

Steven Schveighoffer schveiguy at yahoo.com
Sun Feb 20 08:37:26 PST 2011


On Sat, 19 Feb 2011 22:39:10 -0500, Nick Sabalausky <a at a.a> wrote:

> "bearophile" <bearophileHUGS at lycos.com> wrote in message
> news:ijpkh8$232r$1 at digitalmars.com...
>> dsimcha:
>>
>>> Yeh, I rebuilt the same model in my head over the past few hours (like
>>> you, I had a good mental model of the GC at one point but have slowly
>>> forgotten it).
>>
>> For serious D programmers having such model in the head is so important
>> that I'd like a short page with such explanations & pseudocode in the D
>> site :-)
>>
>
> I seem to remember finding out at one point (the hard way) that arrays of
> primitives are scanned for pointers unless you explicitly tell the GC  
> not to
> do so (for each array). A long time ago, this caused me major headaches  
> when
> I tried to write a batch processing tool for image files. I never  
> actually
> verified this, but my understanding was that the file and image data
> contained false pointers that prevented the data from completed images  
> from
> being collected. So the program just happily kept munching away at more  
> and
> more memory (and presumably, more false pointers) with each image until
> after only a fairly small number of files it ground to a sputtering halt
> under the weight of its own memory footprint.

This is not correct.  Arrays of value-elements (builtins or custom structs  
with no pointers in them, the compiler records this in the TypeInfo) are  
marked NO_SCAN when the array is constructed.

Here are the two problems I know of:

1. You can have a struct with sparse pointers in it, an array of such  
structs would not be marked NO_SCAN.  This means, even the non-pointer  
parts are scanned as pointers.  For example:

struct Bad
{
    void *pointer;
    int[1024] data;
}

the data member is scanned in addition to the pointer member.  This can  
create false hits on the heap, keeping those blocks alive.

IMO, this problem is not very significant -- those types of aggregates are  
uncommon.

2. A large value-element array is kept in memory because of false  
pointers.  There are several types of memory that are *always* considered  
pointers.  This means you always have false pointers laying around, even  
if you never allocate any heap data.  For example, stack data is always  
considered to contain all pointers.  As well as global and TLS data.  I  
think even the proposed "precise" scanning patches for D's GC in bugzilla  
do not address stack or global space.

This is a bigger problem.  If you, for example, allocate a 200MB array, in  
a 4GB address space, that's 1/20th the total address space.  The chances  
that a false pointer somewhere in the stack or TLS or globals "points" at  
that data is 1/20.

If you don't explicitly delete such arrays, and stop pointing at them,  
that memory is now leaked.  If you start allocating more of them, you are  
now even more likely to have false pointers.

This second issue I think is much more likely to happen.  I proposed at  
one point that we should have a library construct that aids in freeing  
such chunks of memory deterministically to try and avoid this problem.  I  
don't think it was well received, but I can't really remember.

-Steve


More information about the Digitalmars-d mailing list