Is a moving GC really needed?

Kevin Bealer kevinbealer at gmail.com
Wed Oct 4 00:53:00 PDT 2006


Lionello Lunesu wrote:
> Chris Miller wrote:
>> On Mon, 02 Oct 2006 05:23:11 -0400, Lionello Lunesu 
>> <lio at lunesu.remove.com> wrote:
>>
>>> I've noticed that some design decisions are made with the possibility 
>>> of a moving GC in mind. Will D indeed end up with a moving GC? If so, 
>>> why? Is a moving GC really worth the extra trouble?
>>>
>>> Being able to move memory blocks reduces memory fragmentation, am I 
>>> correct? Is this the only reason? (For the remainder of this post, 
>>> I'm assuming it is.)
>>>
>>
>> I believe it allows for very fast allocations, almost as fast as stack 
>> allocation. This is something I'm very interested in.
> 
> Why would it allocate faster than a non-moving GC? (Ignoring for the 
> moment the cost of moving.)
> 
>> It may also improve caching as allocated memory is more together.
> 
> This I understand, but what's the granularity (if that's the correct 
> term) of a CPU cache?
> 
> http://en.wikipedia.org/wiki/CPU_cache#Example:_the_K8 : "The data cache 
> keeps copies of 64 byte lines of memory."
> 
> I guess the chances are pretty low for two blocks to be moved close 
> together and be cached together, and be accessed together..
> 
> L.

If you think about it from the point of view of disk blocks or pages it 
makes more sense, but even cache lines could benefit some.

What happens is that normal applications typically have linked lists and 
arrays of pointers to sub-structures.  Moving allocators normally 
traverse these objects in 'depth-first' mode, and copying the data into 
some destination area as they scan it, rather than only moving to fill 
gaps as you might guess.

Copying a linked list in depth first order, means that the nodes of the 
list are copied to a new region of memory - and end up *sequential* in 
memory as a result.  From then on, they are sequential and contiguous, 
packed into nearly as few pages as possible.

Later, as nodes are added to the middle of the list, the list gets 
'scattered', but is always recollected and re-sequentialized during the 
next sweep.

The same is true for arrays of pointers, binary trees, and many other 
structures.  It's probably less important for hash tables and "value" 
arrays.  Also it's more important when memory is full, because disk 
pages are heavier than cache lines.

Java programmers are taught that "ArrayList" is the best list for most 
applications -- it's basically c++'s std::vector -- so now arrays have 
become the norm in Java and C++, and the argument for moving collectors 
is probably weaker now.

For LISP like languages (postscript, lisp, scheme, sml, ocaml), this is 
probably a bigger deal.

Kevin



More information about the Digitalmars-d mailing list