GC Blacklisting

Vladimir Panteleev vladimir at thecybershadow.net
Fri Feb 25 08:59:44 PST 2011


On Fri, 25 Feb 2011 18:30:36 +0200, dsimcha <dsimcha at yahoo.com> wrote:

> == Quote from Vladimir Panteleev (vladimir at thecybershadow.net)'s article
>> P.S. I'm currently in the process of tracking down a memory corruption
>> bug, which *might* result in a GC patch for D1. I'm also instinctively
>> worried that touching the GC code may introduce new memory corruption
>> bugs, which can be EXTREMELY hard to find. I've been chasing this one  
>> for
>> 4 years.
>
> I doubt it's a GC bug.  If it's not a bug in your code, I'd be more  
> inclined to
> assume it's a codegen bug, simply because the codegen is much larger and  
> more
> complex, and there are more opportunities for weird bugs that can only be
> reproduced under very specific circumstances to creep in.  Once you get  
> past the
> superficial cruftiness and unreadability of the codebase and get a good  
> conceptual
> model of it, D's GC is actually pretty simple.

That's what I've been telling myself for the past few years as well. (I've  
written patches and a memory debugger for D and even attempted writing my  
own GCs, so I'm no stranger to D's GC.)

> Also, I've been testing my patches by using the Phobos,
> std.parallelism/parallelfuture, and dstats unittests, and by simply  
> eating my own
> dogfood (i.e. using my modified GC's when running some simulations and  
> stuff).  So
> far, so good.  Unfortunately, we don't have a specific GC test suite,  
> but IMHO if
> it works on this much real-world code, it's unlikely that I've created  
> any bugs.

How can you be so sure this is enough? The particular manifestation of the  
bug I was examining crashed my application 5 hours in, because the GC  
attempted to traverse a free list which had ASCII in it because the item  
had been allocated but it occured in the free list twice (so the first  
instance was used by the app to store text), because a freed (GC'd) object  
was manually deleted again when an element was removed from an associated  
array, and it was freed initially because the GC never reached it, because  
its "parent" was marked as NOSCAN, because the GC relies on NOSCAN being  
cleared on freed objects, and allocating in a destructor called during a  
GC breaks that assumption (and messes things up generally).

Are you at least running your tests with the GC debug options enabled  
(such as MEMSTOMP)? I hope your patches don't break them, either.

In case you missed my other reply, what I was aiming at is that something  
must be done when allocating from destructors. It must either reliably  
work or immediately fail, and definitely not corrupt the GC's state.  
Phobos allocates in destructors in a few places as well (std.zlib being  
one).

-- 
Best regards,
  Vladimir                            mailto:vladimir at thecybershadow.net


More information about the Digitalmars-d mailing list