[D-runtime] Proposed changes to GC interface
Steve Schveighoffer
schveiguy at yahoo.com
Fri Aug 6 13:52:31 PDT 2010
----- Original Message ----
> From: Sean Kelly <sean at invisibleduck.org>
> To: D's runtime library developers list <d-runtime at puremagic.com>
> Sent: Fri, August 6, 2010 4:17:43 PM
> Subject: Re: [D-runtime] Proposed changes to GC interface
>
> On Aug 6, 2010, at 12:34 PM, Steve Schveighoffer wrote:
>
> >>
> >> From: Sean Kelly <sean at invisibleduck.org>
> >
> > [snip]
> >
> >>
> >> gc_alloc() and gc_allocn() would become the default GC allocator routines
>and
>
> >> would set any necessary flags based on the supplied type, store the pointer
>
> >> bitmap, initialize the block based on ti.init[], etc.
> >
> > OK, but why do we need bitmaps/bits if we can just store that info in the
> > TypeInfo? I mean, gc_allocn is pretty much the same as lifetime's
> > _d_newArrayT. It uses the bits set in the typeinfo to determine the
>NO_SCAN
>
> > flag. Why can't the GC use those bits?
>
> It can. I mostly wasn't sure if we wanted to store a full pointer for all
>applicable blocks (to reference TypeInfo) or if we should mix bitfields and
>TypeInfo as needed.
This poses problems for some of the other things you said. For example, if you
want to append to a block that doesn't point to a typeinfo, but you don't want
to pass in a typeinfo, how do you populate the typeinfo for the larger block?
I think perhaps we may need two types of small blocks, ones with typeinfo and
ones with just bits. The ones with just bits could not be used as arrays (i.e.
could only be single elements).
>
> > I'd think that bits are only useful for small blocks where the cost of
>storing
>
> > the TypeInfo pointer is greater than 10% overhead. But if you need to go
>from
>
> > block -> typeinfo, this may be a requirement (see questions below).
> >
> > In addition to these, if the GC is going to handle appending, then it should
>
> > handle how the length is stored. This means, we need functions to get and
>set
>
> > the length to support append and the capacity/assumeSafeAppend functions.
>
> gc_allocn() would handle the initial allocation for array types (I tried to
>wedge both behaviors into one function call and couldn't sort it out to my
>satisfaction), but you're right that there would need to be an additional call
>in there. Maybe gc_extend() could be rewritten to use element counts instead
>of byte counts, assuming we expect to always have TypeInfo available for
>arrays?
We definitely need to support capacity, which can be used to tune an append, or
determine whether an append is going to decouple an array from its original.
This helps to write deterministic code.
But I see how you are going with gc_extend, it's like an abstraction of
appending that the runtime currently handles. I think the gc should present a
consistent interface. So if we're dealing in terms of elements on allocation,
we should deal with elements on extension also.
>
> >> gc_realloc() has never seen much use and promises to become increasingly
>more
>
> >> complicated as TypeInfo is added, etc. I'd prefer to just drop it and let
>the
>
> >> user call gc_extend() if he wants to resize memory in place. This would
>require
>
> >>
> >> allowing gc_extend() to be used on sub-page sized blocks, but this seems
> >> reasonable anyway. If I have a 150 byte array in a 256 byte block I should
>be
>
> >> allowed to extend it to 256 bytes. Doing so is basically a no-op, but it
>frees
>
> >
> >> the user from having to query the block size to determine how to handle an
>array
>
> >>
> >> append operation, for example.
> >
> > I don't think gc_extend's semantics should be changed. If one wants to
>extend
>
> > into the block, one should just get the capacity and use the block. This of
>
> > course is only on blocks that are allocated via gc_malloc. Blocks allocated
>via
>
> > gc_alloc[n] should only be extendable via the runtime functions to keep the
> > TypeInfo sane.
>
> But the runtime functions would in turn call gc_extend(), right? One of the
>things I was thinking of was that gc_extend() currently takes two arguments, a
>required and a desired size. But as far as I know the same value is always
>passed to both. If this is true, why not just eliminate one?
Heh, in my last update to druntime, I started trying to use the min and desired
sizes. What I found was that the runtime would give up too early when trying to
fulfill the desired size. So I fixed it to obey the desired size when possible
:)
I think an extend function that properly supports min and desired would be very
important.
The algorithm would be something like:
if I can extend at least the minimum because the block supports it (without
stomping), do that, as much as possible less than the desired size.
else if I can attach more pages, attach as many as possible to try and get to
desired size,
else reallocate the block as the desired size.
I do believe the algorithm to logarithmically grow an array size should be left
outside the GC. The GC shouldn't make any assumptions it is not told.
But to answer your original point, gc_extend is currently used to add more pages
to a block *in place* without reallocating. For those gc_malloc zealots, we
need to support this somehow. Even if we rename it to something else and
commandeer gc_extend to be array appending.
> > Overall I think this is a good idea. I think actually it's probably
>necessary
>
> > to integrate the precise scanning and array append stuff together.
> >
> > Some things to think about, while they're fresh in my mind:
> > If someone appends to an array with a different typeinfo than it was
>originally
>
> > allocated with, what happens?
>
> This shouldn't be allowed. It's why I don't think gc_extend() should accept
>typeinfo, etc.
Well, it's not illegal to do this:
auto arr = new U[50];
cast(T[]) arr;
arr ~= T(1,2,3);
So we have to either handle this, or reject it, but it has to be done at
runtime. I think the *wrong* approach is to assume the append is trying to add
more U's. So at the very least, gc_extend needs to verify the typeinfo
matches. The compiler already passes the typeinfo to the function, so let's
utilize that info.
Come to think of it, should we provide a function to "retype" a block?
>
> > What happens when you do a new array of void[]?
>
> Same as now I'd think.
I'd rethink this. Since we're storing the typeinfo with a block, why should
void[] be typed as containing pointers? You can always do void*[]...
In fact, you could actually statically disallow new void[x] and probably not
lose much (one can always call gc_malloc).
>
> > Should you be able to retrieve the typeinfo a block was allocated with?
>
> Seems reasonable to expect so.
What about blocks that are tiny where the TypeInfo pointer would be 33%
overhead? If we're only going to store bits.
Again, consider the suggestion that when allocating *arrays* we may always need
TypeInfo, but when allocating single elements, we may just want to store bits.
-Steve
More information about the D-runtime
mailing list