Debugging memory leak.

Mon Oct 8 15:46:12 PDT 2007

Frits van Bommel wrote:
> Sean Kelly wrote:
>> Frits van Bommel wrote:
>>> Sean Kelly wrote:
>>>
>>>>>   - Try Tango?  Is the GC different there?
>>>>
>>>> Somewhat, but void[] arrays are still treated as if they have pointers.
>>>
>>> But AFAICT tango.io.compress.Zlib doesn't allocate any of those, just 
>>> ubyte[] arrays, and exception classes. (The unittest does use a 
>>> MemoryConduit, which internally uses a void[], but nothing else 
>>> should allocate void[]s for that module)
>>
>> Yup.  However, an annoying problem still exists with Buffer.  
>> Basically, this class maintains a void[] reference to a block of 
>> memory allocated as a byte[].  However, if the block is resized for 
>> any reason, the type doing the resizing is used to determine whether 
>> the newly allocated block contains pointers.  I've been meaning to 
>> change the Tango runtime and GC to preserve array block attributes 
>> across reallocations, but it's a somewhat involved process and I 
>> haven't gotten to it yet.
> 
> Actually, that bug (or its Phobos equivalent) seems to be partly to 
> blame as well. If you look at std.zlib, you'll see that right after 
> every "new void[whatever]", std.gc.hasNoPointers is called on the 
> result. However, there are some ~/~= operations on those buffers while 
> typed as void[]s.
> I think if the "has no pointers" bit carried over from the original 
> arrays when concatenating (if neither contains pointers[1], the result 
> doesn't contain any either) std.zlib might actually be leak-free.
> 
> 
> [1]: If an array gc-allocated, use the attribute as set by the 
> allocation function or the user. Otherwise, use the default for the 
> static type (through TypeInfo).
> 
> 
> P.S. I think I'm starting to realize what you meant by that "a somewhat 
> involved process" comment ;).

Some of the complication comes from a desire for efficiency.  Currently, 
the array reallocation routines may call two synchronized GC functions: 
gc_sizeOf and gc_malloc.  Preserving block attributes would currently 
require calling gc_getAttr as well, which would mean three mutex locks 
for a single append/resize operation.  What I'd like to do is create an 
aggregate routine called something like gc_describe which returns all 
the relevant information in one go, and replace the call to gc_getAttr 
with a call to gc_describe.  Then the relevant block info can be 
preserved and passed into the call to gc_malloc later on.  The result 
being that the current maximum of two mutex locks would be retained.

The other complication will be a maintenance issue.  If somewhere in the 
runtime performs an array reallocation somehow differently, there will 
be "holes" in the functionality preserving block attributes. 
Fortunately, this portion of the runtime is fairly stable so I don't 
foresee it being a problem.

As an aside, I'd originally considered replacing the whole mess with a 
call to gc_realloc, but currently gc_realloc is allowed to fail of the 
supplied pointer is to the interior of a memory block (ie. a slice).  I 
don't want to change this, because the current scheme allows a GC 
implementation to simply call C malloc/realloc/free, which would not be 
possible if the GC were required to operate correctly on interior pointers.

Sean