[D-runtime] Proposed changes to GC interface

Fri Aug 6 13:52:31 PDT 2010

----- Original Message ----
> From: Sean Kelly <sean at invisibleduck.org>
> To: D's runtime library developers list <d-runtime at puremagic.com>
> Sent: Fri, August 6, 2010 4:17:43 PM
> Subject: Re: [D-runtime] Proposed changes to GC interface
> 
> On Aug 6, 2010, at 12:34 PM, Steve Schveighoffer wrote:
> 
> >> 
> >> From: Sean Kelly <sean at invisibleduck.org>
> > 
> > [snip]
> > 
> >> 
> >> gc_alloc() and gc_allocn()  would become the default GC allocator routines 
>and 
>
> >> would set any  necessary flags based on the supplied type, store the pointer 
>
> >>  bitmap, initialize the block based on ti.init[], etc.
> > 
> > OK, but  why do we need bitmaps/bits if we can just store that info in the 
> >  TypeInfo?  I mean, gc_allocn is pretty much the same as lifetime's 
> >  _d_newArrayT.  It uses the bits set in the typeinfo to determine the  
>NO_SCAN 
>
> > flag.  Why can't the GC use those bits?
> 
> It  can.  I mostly wasn't sure if we wanted to store a full pointer for all  
>applicable blocks (to reference TypeInfo) or if we should mix bitfields and  
>TypeInfo as needed.

This poses problems for some of the other things you said.  For example, if you 
want to append to a block that doesn't point to a typeinfo, but you don't want 
to pass in a typeinfo, how do you populate the typeinfo for the larger block?

I think perhaps we may need two types of small blocks, ones with typeinfo and 
ones with just bits.  The ones with just bits could not be used as arrays (i.e. 
could only be single elements).

> 
> > I'd think that bits are only useful for small  blocks where the cost of 
>storing 
>
> > the TypeInfo pointer is greater than  10% overhead.  But if you need to go 
>from 
>
> > block -> typeinfo,  this may be a requirement (see questions below).
> > 
> > In addition to  these, if the GC is going to handle appending, then it should 
>
> > handle  how the length is stored.  This means, we need functions to get and 
>set 
>
> > the length to support append and the capacity/assumeSafeAppend  functions.
> 
> gc_allocn() would handle the initial allocation for array  types (I tried to 
>wedge both behaviors into one function call and couldn't sort  it out to my 
>satisfaction), but you're right that there would need to be an  additional call 
>in there.  Maybe gc_extend() could be rewritten to use  element counts instead 
>of byte counts, assuming we expect to always have  TypeInfo available for 
>arrays?

We definitely need to support capacity, which can be used to tune an append, or 
determine whether an append is going to decouple an array from its original.  
This helps to write deterministic code.

But I see how you are going with gc_extend, it's like an abstraction of 
appending that the runtime currently handles.  I think the gc should present a 
consistent interface.  So if we're dealing in terms of elements on allocation, 
we should deal with elements on extension also.

> 
> >> gc_realloc() has never seen much  use and promises to become increasingly 
>more 
>
> >> complicated as  TypeInfo is added, etc.  I'd prefer to just drop it and let 
>the 
>
> >> user call gc_extend() if he wants to resize memory in place.   This would 
>require 
>
> >> 
> >> allowing gc_extend() to be used on  sub-page sized blocks, but this seems 
> >> reasonable anyway.  If I  have a 150 byte array in a 256 byte block I should 
>be 
>
> >> allowed to  extend it to 256 bytes.  Doing so is basically a no-op, but it 
>frees 
>
> > 
> >> the user from having to query the block size to determine  how to handle an 
>array 
>
> >> 
> >> append operation, for  example.
> > 
> > I don't think gc_extend's semantics should be  changed.  If one wants to 
>extend 
>
> > into the block, one should just  get the capacity and use the block.  This of 
>
> > course is only on  blocks that are allocated via gc_malloc.  Blocks allocated 
>via 
>
> >  gc_alloc[n] should only be extendable via the runtime functions to keep the 

> > TypeInfo sane.
> 
> But the runtime functions would in turn call  gc_extend(), right?  One of the 
>things I was thinking of was that  gc_extend() currently takes two arguments, a 
>required and a desired size.   But as far as I know the same value is always 
>passed to both.  If this is  true, why not just eliminate one?

Heh, in my last update to druntime, I started trying to use the min and desired 
sizes.  What I found was that the runtime would give up too early when trying to 
fulfill the desired size.  So I fixed it to obey the desired size when possible 
:)

I think an extend function that properly supports min and desired would be very 
important.

The algorithm would be something like:

if I can extend at least the minimum because the block supports it (without 
stomping), do that, as much as possible less than the desired size.
else if I can attach more pages, attach as many as possible to try and get to 
desired size,
else reallocate the block as the desired size.

I do believe the algorithm to logarithmically grow an array size should be left 
outside the GC.  The GC shouldn't make any assumptions it is not told.

But to answer your original point, gc_extend is currently used to add more pages 
to a block *in place* without reallocating.  For those gc_malloc zealots, we 
need to support this somehow.  Even if we rename it to something else and 
commandeer gc_extend to be array appending.

> > Overall I think this is a good idea.  I think  actually it's probably 
>necessary 
>
> > to integrate the precise scanning and  array append stuff together.
> > 
> > Some things to think about, while  they're fresh in my mind:
> > If someone appends to an array with a  different typeinfo than it was 
>originally 
>
> > allocated with, what  happens?
> 
> This shouldn't be allowed.  It's why I don't think  gc_extend() should accept 
>typeinfo, etc.

Well, it's not illegal to do this:

auto arr = new U[50];
cast(T[]) arr;

arr ~= T(1,2,3);

So we have to either handle this, or reject it, but it has to be done at 
runtime.  I think the *wrong* approach is to assume the append is trying to add 
more U's.  So at the very least, gc_extend needs to verify the typeinfo 
matches.  The compiler already passes the typeinfo to the function, so let's 
utilize that info.

Come to think of it, should we provide a function to "retype" a block?

> 
> > What happens when you do a  new array of void[]?
> 
> Same as now I'd think.

I'd rethink this.  Since we're storing the typeinfo with a block, why should 
void[] be typed as containing pointers?  You can always do void*[]...

In fact, you could actually statically disallow new void[x] and probably not 
lose much (one can always call gc_malloc).

> 
> > Should you be  able to retrieve the typeinfo a block was allocated with?
> 
> Seems  reasonable to expect so.

What about blocks that are tiny where the TypeInfo pointer would be 33% 
overhead?  If we're only going to store bits.

Again, consider the suggestion that when allocating *arrays* we may always need 
TypeInfo, but when allocating single elements, we may just want to store bits.

-Steve