Is Phobos's Garbage Collector utterly broken? (Phobos vs Tango)
Sean Kelly
sean at f4.ca
Tue Aug 7 09:03:56 PDT 2007
Vladimir Panteleev wrote:
> On Mon, 06 Aug 2007 18:20:22 +0300, Sean Kelly <sean at f4.ca> wrote:
>
>> Vladimir Panteleev wrote:
>>> On Wed, 01 Aug 2007 09:08:16 +0300, Vladimir Panteleev <thecybershadow at gmail.com> wrote:
>>>
>>>> I initially wrote it to try to find a memory leak in Tango's GC (which was actually fixed at some point).
>>> Turns out it's still there, and it's the old "binary data" issue with pointer-searching GCs, which was fixed in D/Phobos 1.001 by making the GC type-aware. Check out the attached sample programs for a simple example - the Tango version can't know there are no pointers in its GrowBuffer's data, and thus leaks like crazy, while the Phobos version stays at 13MB.
>> The cause of this is somewhat an artifact of the OO design in Tango.
>> The underlying buffer being allocated is a byte[], but the reference to
>> it is a void[]. The problem occurs when GrowBuffer grows the buffer by
>> increasing its length, which causes the buffer to be reallocated as a
>> void[]. The reason this is a problem is that neither runtime, Tango or
>> Phobos, preserves memory block attributes during a reallocation--they
>> both simply key off the type being used to perform the reallocation.
>> Obviously, this is a problem, and I've decided to change the behavior in
>> Tango accordingly. It will take some doing and I'm a bit over-busy at
>> the moment, but before long the Tango runtime will preserve all block
>> attributes on a reallocation. In essence, this will occur by having the
>> runtime call gc_realloc, but before this will work gc_realloc must be
>> fixed to handle slices.
>
> I'd still rather vote towards making the GC not scan void[] - it makes the most sense.
Others have expressed the same opinion. I'll withhold my own thoughts
but to say that I think the idea behind the current approach is twofold:
1. void[] is the 'any' buffer type for in-program data. The type of the
underlying data could be an array of bytes or it could be an array of
structs containing pointers. The 'any' buffer type for out-of-program
data is byte[], because until such data is translated to D types, it's
merely a stream of bytes.
2. It is nice to have an in-language option for specifying that an 'any'
buffer type may contain pointers.
The most obvious counter-argument is that by assigning special behavior
to void[], those who don't like that behavior must use something else
and lose the implicit conversion that void[] provides. This is
extremely convenient in some cases. Another being that because void[]
does not specify a type, it is appropriate for out-of-program data as
well, and scanning a stream of bytes read from a file for pointers is bad.
I personally don't think there is a solution to this that will make
everyone happy, and am hoping that by preserving block attributes I will
make void[] more usable as a reference type since then only the first
allocation must be "new byte[x]". It is also more consistent, since
some reallocations obtain a new array (and lose block information),
while others do not (and preserve block information).
Sean
More information about the Digitalmars-d
mailing list