[D-runtime] Proposed changes to GC interface

Fri Aug 6 13:17:43 PDT 2010

On Aug 6, 2010, at 12:34 PM, Steve Schveighoffer wrote:

>> 
>> From: Sean Kelly <sean at invisibleduck.org>
> 
> [snip]
> 
>> 
>> gc_alloc() and gc_allocn() would become the default GC allocator routines and 
>> would set any necessary flags based on the supplied type, store the pointer 
>> bitmap, initialize the block based on ti.init[], etc.
> 
> OK, but why do we need bitmaps/bits if we can just store that info in the 
> TypeInfo?  I mean, gc_allocn is pretty much the same as lifetime's 
> _d_newArrayT.  It uses the bits set in the typeinfo to determine the NO_SCAN 
> flag.  Why can't the GC use those bits?

It can.  I mostly wasn't sure if we wanted to store a full pointer for all applicable blocks (to reference TypeInfo) or if we should mix bitfields and TypeInfo as needed.

> I'd think that bits are only useful for small blocks where the cost of storing 
> the TypeInfo pointer is greater than 10% overhead.  But if you need to go from 
> block -> typeinfo, this may be a requirement (see questions below).
> 
> In addition to these, if the GC is going to handle appending, then it should 
> handle how the length is stored.  This means, we need functions to get and set 
> the length to support append and the capacity/assumeSafeAppend functions.

gc_allocn() would handle the initial allocation for array types (I tried to wedge both behaviors into one function call and couldn't sort it out to my satisfaction), but you're right that there would need to be an additional call in there.  Maybe gc_extend() could be rewritten to use element counts instead of byte counts, assuming we expect to always have TypeInfo available for arrays?

>> gc_realloc() has never seen much use and promises to become increasingly more 
>> complicated as TypeInfo is added, etc.  I'd prefer to just drop it and let the 
>> user call gc_extend() if he wants to resize memory in place.  This would require 
>> 
>> allowing gc_extend() to be used on sub-page sized blocks, but this seems 
>> reasonable anyway.  If I have a 150 byte array in a 256 byte block I should be 
>> allowed to extend it to 256 bytes.  Doing so is basically a no-op, but it frees 
> 
>> the user from having to query the block size to determine how to handle an array 
>> 
>> append operation, for example.
> 
> I don't think gc_extend's semantics should be changed.  If one wants to extend 
> into the block, one should just get the capacity and use the block.  This of 
> course is only on blocks that are allocated via gc_malloc.  Blocks allocated via 
> gc_alloc[n] should only be extendable via the runtime functions to keep the 
> TypeInfo sane.

But the runtime functions would in turn call gc_extend(), right?  One of the things I was thinking of was that gc_extend() currently takes two arguments, a required and a desired size.  But as far as I know the same value is always passed to both.  If this is true, why not just eliminate one?

>> Finally, I really want to change the APPENDABLE bit to 
>> NO_APPEND/STATIC/SINGLE/whatever since the default (zero value) behavior should 
> 
>> be that an allocated block is not an array.
> 
> I think there is a misunderstanding here.  a 0 APPENDABLE bit means it's not 
> appendable (i.e. not an array).  Isn't that what you want?

Oops, you're right.

> Overall I think this is a good idea.  I think actually it's probably necessary 
> to integrate the precise scanning and array append stuff together.
> 
> Some things to think about, while they're fresh in my mind:
> If someone appends to an array with a different typeinfo than it was originally 
> allocated with, what happens?

This shouldn't be allowed.  It's why I don't think gc_extend() should accept typeinfo, etc.

> What happens when you do a new array of void[]?

Same as now I'd think.

> Should you be able to retrieve the typeinfo a block was allocated with?

Seems reasonable to expect so.

> Dare I say it, but this is going to once again separate Tango from phobos, since 
> Tango does not use druntime even though Tango uses an old version of druntime.  
> Should we care?  I say no, I strongly believe Tango is stuck on D1 forever.

I no longer care about retaining compatibility with Tango for exactly the same reason.  I do still think there's value in having a separate runtime and standard library however, because druntime encapsulates the minimum support code necessary for a fully functional D application.  With the all-in-one approach it's just too easy for half the standard library to be pulled into every app simply because some low-level routine wants to do some string formatting or whatever, and I think we'd lose a chunk of potential C/C++ converts if this were the case.