Is a moving GC really needed?

Mon Oct 2 04:44:31 PDT 2006

Lionello Lunesu wrote:
> I've noticed that some design decisions are made with the possibility of 
> a moving GC in mind. Will D indeed end up with a moving GC? If so, why? 
> Is a moving GC really worth the extra trouble?
> 
> Being able to move memory blocks reduces memory fragmentation, am I 
> correct? Is this the only reason? (For the remainder of this post, I'm 
> assuming it is.)
> 
> I've experienced the problems of memory fragmentation first hand. In the 
> project I'm working on (3D visualization software) I've had to track 
> out-of-memory errors, which turned out to be because of virtual memory 
> fragmentation. At some point, even a malloc/VirtualAlloc (the MS CRT's 
> malloc directly calls VirtualAlloc for big memory blocks) for 80MB 
> failed. Our problems were resolved by reserving a huge block (~1GB) of 
> virtual memory at application start-up, to prevent third-party DLLs from 
> fragmenting the virtual address space.
> 
> One of the reasons we ran into problems with memory fragmentation was 
> that Windows is actually only using 2GB of virtual address space. Using 
> Windows Address Extension (a flag passed to the linker), however, it is 
> possible to get the full 4GB of virtual address space available. That's 
> an extra 2GB of continuous virtual address space! In the (near) future 
> we'll have 2^64 bytes of virtual address space, which "should be enough 
> for anyone".
> 
> Is the extra complexity and run-time overhead of a moving GC worth the 
> trouble, at this point in time?

While I'm no expert, I doubt a moving GC is even possible in a systems 
language like D.

First, if you move things around, you obviously need to be precise when 
updating pointers, lest all hell breaks loose. But how do you update

union {
     int a;
     void* b;
}

? While you could forbid overlap of pointers and non-pointer data, what 
about custom allocators, assembler, C libraries (including OS runtime!), 
etc.?

And second, for the generational case, you need an efficient way to 
track references from older objects to newer objects, otherwise you need 
to scan them all, defeating the point of having generations in the first 
place. While a JIT-compiled language/runtime can relatively easily (and 
efficiently) do this by injecting appropriate code into older objects, I 
think it's practically impossible to do so with native code.

I've no idea how to overcome those without involving the end-user 
(programmer) and/or losing quite a lot of speed during normal operation, 
which I'm quite sure are not acceptable trade-offs.

On the bright side, I believe there's considerably less need to 
heap-allocate in D than, say, in Java, and even when used, one can 
overcome a bad(slow) GC in many cases (with stuff like malloc/free, 
delete, etc.), so the performance of GC is not as critical.

If the compiler/GC were improved to differentiate between atomic and 
non-atomic data (the latter contains pointers to other data, the first 
doesn't), so memory areas that can't contain pointers don't get scanned 
at all*, I think I'd already be quite happy with the state of things..

xs0

*) that may already be the case, but last time I checked it wasn't :)