Potential GSoC project - GC improvements

Sun Mar 13 15:22:12 PDT 2016

On Sun, 13 Mar 2016 12:43:37 -0700, Adam Wilson wrote:
>> A "partially moving" GC does not exist, as far as I know.
>>
>>
> Yep, it's a Bad Idea.

It's not a standard term. Google's only seeing about four references to 
the term, none of them authoritative or definitive. Since it's non-
standard, it would behoove you to define it.

I think you mean a GC in which some objects are pinned (due to 
limitations in the GC rather than the user explicitly requesting that 
they be pinned).

Regardless, you ought to define the term and explain why it's a bad idea 
instead of providing a bare assertion. It doesn't make any difference to 
the operation of the GC *why* something's pinned; what matters is how 
much is pinned.

So, for instance, you could point to a conservative moving GC, select one 
circumstance in which it can't move objects, tell us how much memory that 
ends up pinning on average, and then you'll have an argument.

>> In D, we have a tiny bit of type information surfaced to the GC to
>> reduce the incidence of false pointers. We could additionally take
>> advantage of the fact that numbers tend to be small and people tend to
>> use 4 to 8 byte numbers: we could arrange our address space to avoid
>> ranges where false pointers are likely. But that's probably
>> unnecessary.
>>
>>
> We are not limited by what we currently have. Most of Rainer's work
> involved surfacing more data to the GC.

Sure. My point is that we don't *need* a lot of type information in the GC 
to get reasonable advantages. A little type information goes a long way,  
and a little caution in placing objects in the heap can help too.

>> In D, thanks to unions, we can never create a fully precise collector,
>> but we can create a moving collector. Objects pointed to from a union
>> can't be moved, but few objects will be pointed to from a union.
>
> Actually, we probably could with modifications to the compiler.

No, we couldn't. I've been assuming that we can make whatever 
modifications we want to the compiler for free.

The compiler has static type information. This explicitly contains 
ambiguities. When it's based solely on data available at compile time, 
the compiler can resolve the ambiguities -- and you could have written 
the code without using a union.

But in the majority of cases, people use a union because they must. That 
means which values are set depends on runtime data, and the compiler 
can't tell you which fields will be set.

We could provide a runtime function that is invoked when you set a value 
inside a union. That won't work very well. Consider:

  struct A { long a; long b; }
  struct B { double a; void* p; }
  union AB { A a; B b; }

  extern void foo(ref A a);

  AB ab;
  ab.b.p = GC.malloc(512);
  foo(ab.a);

Was that pointer overwritten? No clue! The compiler can't tell. When 
compiling the definition of foo(), it didn't have any idea whether the 
struct was a union member. Now we have to intercept *every* indirect 
write to *any* value with a runtime call, even if you never use a union 
in your program --

Oh look, now Python's beating us by an order of magnitude in benchmarks.

What do we get for it? Well, the 1% of programs that use this pattern 
will allow the GC to move an additional 0.1% of their objects.

Or we leave the GC conservative in this one case. We can detect this case 
pretty easily and treat apparent pointers as pinned. Everything will 
work, and the GC will be better at releasing memory than it currently is.