(Semi) precise GC [was: Re: Std Phobos 2 and logging library?]
Robert Jacques
sandford at jhu.edu
Mon Apr 13 18:18:21 PDT 2009
On Mon, 13 Apr 2009 14:54:57 -0400, Frits van Bommel
<fvbommel at remwovexcapss.nl> wrote:
> Sean Kelly wrote:
>> Leandro Lucarella wrote:
>>> But right now gc_malloc() doesn't take any TypeInfo argument. I can't
>>> see
>>> where I can get the TypeInfo in the first place =/
>> The call would have to be modified. Right now the best you can do is
>> pass BlkAttr.NO_SCAN. And storing a pointer per block could add a good
>> bit of bookkeeping overhead for small objects, of course. Perhaps the
>> TypeInfo array could be converted to a bitmap or some such.
>
> Let's see, you'd need 2 bits per pointer-sized block of bytes, to encode
> these possibilities:
> a) Yeah, this is a pointer
> b) Nope, not a pointer
> c) Maybe a pointer (union, void[])
> c2) (optional) A (somehow) explicitly pinned pointer (treated identical
> to (c) for GC purposes; needs to be followed during marking, but data
> pointed to can't be moved)
> d) (optional, since we have a value left) This is a weak pointer
>
> I'd split these up as such: One bit to indicate that it can be read as a
> pointer (and should thus be followed when marking, for instance) and one
> to indicate it can be written as a pointer (so it can be moved for (a)
> or nulled for (d)). That gives us these values for the two-bit field:
> enum PtrBits {
> // Actual values
> JustData = 0b00,
> MaybePointer = 0b01,
> PinnedPointer = 0b01,
> WeakPointer = 0b10,
> Pointer = 0b11,
>
> // For '&' tests
> ReadableFlag = 0b01,
> WritableFlag = 0b10,
> }
>
> Like I said, this would cost 2 bits per pointer-sized chunk, so 1/16 of
> size for 32-bit systems and 1/32th of the memory block size for 64-bit
> systems. It'd have to be rounded up to a whole number of bytes of
> course, and possibly T.alignof if stored at the start of the block.
> (Storing it at the end of the block would avoid that)
>
> This could be bounded to one pointer worth of memory per block if the GC
> treats blocks > 16*4 = 64 bytes (on 32-bit systems) or > 32*8 = 256
> bytes (on 64-bit systems) specially by just storing the raw TypeInfo
> reference instead of the bitfield for the memory block.
> (Implementer's choice on what to do for
> (size_t.sizeof-1)*4*size_t.sizeof to size_t.sizeof^2 * 4 bytes, where
> the bit-encoded data takes up the same number of bytes as a pointer
> would)
An alternative to this is to encode the information in ClassInfo and use
it instead. (You'd have to create a fake ClassInfo for structs and
arrays.) Then the GC only has to track the start of each object (i.e. the
beginning of a block in the current GC). The advantage is that this has 0
storage requirements for objects and on average < 4 bytes for structs and
arrays (thanks to the coarse block sizes of the current GC).
More information about the Digitalmars-d
mailing list