Is a moving GC really needed?
Kevin Bealer
kevinbealer at gmail.com
Wed Oct 4 00:53:00 PDT 2006
Lionello Lunesu wrote:
> Chris Miller wrote:
>> On Mon, 02 Oct 2006 05:23:11 -0400, Lionello Lunesu
>> <lio at lunesu.remove.com> wrote:
>>
>>> I've noticed that some design decisions are made with the possibility
>>> of a moving GC in mind. Will D indeed end up with a moving GC? If so,
>>> why? Is a moving GC really worth the extra trouble?
>>>
>>> Being able to move memory blocks reduces memory fragmentation, am I
>>> correct? Is this the only reason? (For the remainder of this post,
>>> I'm assuming it is.)
>>>
>>
>> I believe it allows for very fast allocations, almost as fast as stack
>> allocation. This is something I'm very interested in.
>
> Why would it allocate faster than a non-moving GC? (Ignoring for the
> moment the cost of moving.)
>
>> It may also improve caching as allocated memory is more together.
>
> This I understand, but what's the granularity (if that's the correct
> term) of a CPU cache?
>
> http://en.wikipedia.org/wiki/CPU_cache#Example:_the_K8 : "The data cache
> keeps copies of 64 byte lines of memory."
>
> I guess the chances are pretty low for two blocks to be moved close
> together and be cached together, and be accessed together..
>
> L.
If you think about it from the point of view of disk blocks or pages it
makes more sense, but even cache lines could benefit some.
What happens is that normal applications typically have linked lists and
arrays of pointers to sub-structures. Moving allocators normally
traverse these objects in 'depth-first' mode, and copying the data into
some destination area as they scan it, rather than only moving to fill
gaps as you might guess.
Copying a linked list in depth first order, means that the nodes of the
list are copied to a new region of memory - and end up *sequential* in
memory as a result. From then on, they are sequential and contiguous,
packed into nearly as few pages as possible.
Later, as nodes are added to the middle of the list, the list gets
'scattered', but is always recollected and re-sequentialized during the
next sweep.
The same is true for arrays of pointers, binary trees, and many other
structures. It's probably less important for hash tables and "value"
arrays. Also it's more important when memory is full, because disk
pages are heavier than cache lines.
Java programmers are taught that "ArrayList" is the best list for most
applications -- it's basically c++'s std::vector -- so now arrays have
become the norm in Java and C++, and the argument for moving collectors
is probably weaker now.
For LISP like languages (postscript, lisp, scheme, sml, ocaml), this is
probably a bigger deal.
Kevin
More information about the Digitalmars-d
mailing list