Is Phobos's Garbage Collector utterly broken? (Phobos vs Tango)

Sean Kelly sean at f4.ca
Tue Aug 7 09:03:56 PDT 2007


Vladimir Panteleev wrote:
> On Mon, 06 Aug 2007 18:20:22 +0300, Sean Kelly <sean at f4.ca> wrote:
> 
>> Vladimir Panteleev wrote:
>>> On Wed, 01 Aug 2007 09:08:16 +0300, Vladimir Panteleev <thecybershadow at gmail.com> wrote:
>>>
>>>> I initially wrote it to try to find a memory leak in Tango's GC (which was actually fixed at some point).
>>> Turns out it's still there, and it's the old "binary data" issue with pointer-searching GCs, which was fixed in D/Phobos 1.001 by making the GC type-aware. Check out the attached sample programs for a simple example - the Tango version can't know there are no pointers in its GrowBuffer's data, and thus leaks like crazy, while the Phobos version stays at 13MB.
>> The cause of this is somewhat an artifact of the OO design in Tango.
>> The underlying buffer being allocated is a byte[], but the reference to
>> it is a void[].  The problem occurs when GrowBuffer grows the buffer by
>> increasing its length, which causes the buffer to be reallocated as a
>> void[].  The reason this is a problem is that neither runtime, Tango or
>> Phobos, preserves memory block attributes during a reallocation--they
>> both simply key off the type being used to perform the reallocation.
>> Obviously, this is a problem, and I've decided to change the behavior in
>> Tango accordingly.  It will take some doing and I'm a bit over-busy at
>> the moment, but before long the Tango runtime will preserve all block
>> attributes on a reallocation.  In essence, this will occur by having the
>> runtime call gc_realloc, but before this will work gc_realloc must be
>> fixed to handle slices.
> 
> I'd still rather vote towards making the GC not scan void[] - it makes the most sense.

Others have expressed the same opinion.  I'll withhold my own thoughts 
but to say that I think the idea behind the current approach is twofold:

1. void[] is the 'any' buffer type for in-program data.  The type of the 
underlying data could be an array of bytes or it could be an array of 
structs containing pointers.  The 'any' buffer type for out-of-program 
data is byte[], because until such data is translated to D types, it's 
merely a stream of bytes.

2. It is nice to have an in-language option for specifying that an 'any' 
buffer type may contain pointers.

The most obvious counter-argument is that by assigning special behavior 
to void[], those who don't like that behavior must use something else 
and lose the implicit conversion that void[] provides.  This is 
extremely convenient in some cases.  Another being that because void[] 
does not specify a type, it is appropriate for out-of-program data as 
well, and scanning a stream of bytes read from a file for pointers is bad.

I personally don't think there is a solution to this that will make 
everyone happy, and am hoping that by preserving block attributes I will 
make void[] more usable as a reference type since then only the first 
allocation must be "new byte[x]".  It is also more consistent, since 
some reallocations obtain a new array (and lose block information), 
while others do not (and preserve block information).


Sean



More information about the Digitalmars-d mailing list