More precise GC

Robert Jacques sandford at jhu.edu
Sun Mar 28 07:16:21 PDT 2010


On Sun, 28 Mar 2010 09:40:10 -0300, bearophile <bearophileHUGS at lycos.com>  
wrote:
> This is just a small invention.
> Enums can decrease the precision of the GC, because fields can be  
> pointers or values, an example:
>
>
> enum NodeType { value, sum, product }
>
> struct Node {
>     NodeType type;
>
>     union {
>         double val;
>         struct {
>             Node* left, right;
>         }
>     }
> }
>
> (Node can be used to build a simple expression tree, that can contain  
> things like: 2.5 * 3.0 + 2.0).
>
> In this case the GC has to use the safe strategy of always following the  
> left and right pointers, even when they are a double.
>
> At runtime the program usually knows what's inside the enum, if the enum  
> truly contains pointers. This information can be inside a tag as in this  
> example, inside a tagged pointer that points to the struct/enum, or at  
> worst in the code that uses the struct/enum. But generally the compiler  
> is not able to use this information, because a compiler usually doesn't  
> understand the program semantics.
>
> So I can think of a new optional standard class/struct/enum method, for  
> example gcfollow() that the garbage collector recognizes when it is  
> present and useful. It gets called with the attribute name and returns  
> true at runtime if the the GC has to follow a pointer/reference:
>
>
> struct Node {
>     NodeType type;
>
>     union {
>         double val;
>         struct {
>             Node* left, right;
>         }
>     }
>
>     bool gcfollow(string field)() {
>         return type != NodeType.value;
>     }
> }
>
>
> Note that the GC will call gcfollow() (here a template for more  
> efficiency) only with the strings of "left" or "right", because "type"  
> and "val" attributes can't contain pointers/references.
>
> So this gcfollow() will be instantiated only with "left" or "right", and  
> in both cases it returns true if the node isn't a value, so the GC will  
> not mistake the "val" double in the leaves of this expression tree as a  
> couple of pointers. The GC will follow the left and right pointers only  
> for the internal nodes of the tree. This slows down the GC a little but  
> increases its precision (if gcfollow() has no bugs).
>
> If gcfollow() is not present the GC acts as it does now, assuming the  
> pointers are always present.
>
> If a struct contains no pointers/references the gcfollow() is never  
> used. If a struct contains another struct, the GC will call gcfollow()  
> of the inner struct too if it contains pointers/references, etc.
>
> Bye,
> bearophile

This would require structs/arrays/etc to contain a header with a vtable,  
so the GC could know what to do. You'd probably save memory (the headers  
cost 16 bytes) and would definitely save collection time simply breaking  
those unions into things with pointers and things without pointers. In  
your example, doing that would cost some extra memory, as you'd go from a  
16-byte block to a 32-byte block. However, remember, the GC allocates on  
16-byte boundaries so each Node* actually has 4-bits (8 total) in which to  
hide an enum.



More information about the Digitalmars-d mailing list