Is a moving GC really needed?

Mon Oct 2 14:31:28 PDT 2006

xs0 wrote:
> 
> While I'm no expert, I doubt a moving GC is even possible in a systems 
> language like D.
> 
> First, if you move things around, you obviously need to be precise when 
> updating pointers, lest all hell breaks loose. But how do you update
> 
> union {
>     int a;
>     void* b;
> }
> 
> ? While you could forbid overlap of pointers and non-pointer data, what 
> about custom allocators, assembler, C libraries (including OS runtime!), 
> etc.?

For the union, I might suggest a more acceptable tradeoff - mandate that 
some data be inserted before every union to tell the gc which member is 
selected at any moment during program execution.  Whenever an assignment 
is done to the union, code is inserted to update the union's status.  So 
your union would look more like this:

enum StructStatus
{
   a,
   b,
}

struct
{
   StructStatus status; //or size_t, whichever works

   union
   {
     int a;
     void* b;
   }
}

Now the GC can be precise with unions.  Notice also the enum, which 
would be nice to make available to userland - AFAIK many unions are 
coded in a struct like that, so this will not be a loss in memory usage 
for those cases, provided D exposes the implicit union information.  At 
any rate, unions seem pretty rare, so no one would notice the extra mem 
usage.

Not sure how custom allocators mess up the GC, I thought these were just 
on their own anyways.  If a pointer to something is outside of the GC 
heap, the GC doesn't bother changing it or collecting it or moving 
anything.

Assembler is a bit tricky, maybe someone smarter than I can handle it 
better, but here's a shot with some psuedoasm:

struct Foo
{
   int member1;
   int member2;
}
Foo bar;

...

Foo* foo = &bar;
int extracted;
// foo spotted in the assembly block, never mind the context
// as such, foo gets pinned.
asm
{
   mov EAX, foo;         // EAX = foo;
   add EAX, 4;           // EAX += 4;
   mov extracted, [EAX]; // extracted = *EAX; or extracted = foo.member2;
}
// foo is unpinned here

As for C libraries, it seems like the same thing as custom allocators. 
The C heap is outside of the GC's jurisdiction and won't be moved or 
manipulated in any way.  C code that handles D objects will have to be 
careful, and the callee D code will have to pin the objects before the 
go out into the unknown.

> 
> On the bright side, I believe there's considerably less need to 
> heap-allocate in D than, say, in Java, and even when used, one can 
> overcome a bad(slow) GC in many cases (with stuff like malloc/free, 
> delete, etc.), so the performance of GC is not as critical.

structs are teh rulez.

I'm still not comfortable with manual memory management in D though, 
mostly because the standard lib (phobos) is built with GC in mind and 
will probably leak the hell out of my program if I trust it too far. 
Either that or I have to roll my own functions, which sucks, or I have 
to be stuck with std.c which also sucks because it's not nearly as nice 
as phobos IMO.

Mostly I agree with this though.

Also, I wonder, if I were to make a tool that does escape analysis on 
your program, then finds that a number of classes can either be stack 
allocated or safely deleted after they reach a certain point in the 
code, then would this change the effectiveness of a generational GC? 
Perhaps part of why young objects die so often is because they are 
temporary things that can often be safely deleted at the end of scope or 
some such.

> 
> If the compiler/GC were improved to differentiate between atomic and 
> non-atomic data (the latter contains pointers to other data, the first 
> doesn't), so memory areas that can't contain pointers don't get scanned 
> at all*, I think I'd already be quite happy with the state of things..
> 
> 
> xs0
> 
> *) that may already be the case, but last time I checked it wasn't :)

I'd love this optimization.  It doesn't seem too horribly hard to do 
either.  The GC needs a new heap and a new allocation function and the 
compiler needs to be trained to use the new allocation function.