Potential GSoC project - GC improvements

Adam Wilson via Digitalmars-d digitalmars-d at puremagic.com
Sun Mar 13 16:34:44 PDT 2016


Chris Wright wrote:
> On Sun, 13 Mar 2016 12:43:37 -0700, Adam Wilson wrote:
>>> A "partially moving" GC does not exist, as far as I know.
>>>
>>>
>> Yep, it's a Bad Idea.
>
> It's not a standard term. Google's only seeing about four references to
> the term, none of them authoritative or definitive. Since it's non-
> standard, it would behoove you to define it.
>
> I think you mean a GC in which some objects are pinned (due to
> limitations in the GC rather than the user explicitly requesting that
> they be pinned).
>
> Regardless, you ought to define the term and explain why it's a bad idea
> instead of providing a bare assertion. It doesn't make any difference to
> the operation of the GC *why* something's pinned; what matters is how
> much is pinned.
>
> So, for instance, you could point to a conservative moving GC, select one
> circumstance in which it can't move objects, tell us how much memory that
> ends up pinning on average, and then you'll have an argument.
>

Is there an implementation of a conservative moving (compacting) GC out 
there? I'm not aware of one, but there are a lot of GC's out there. 
Boehm isn't.

>>> In D, we have a tiny bit of type information surfaced to the GC to
>>> reduce the incidence of false pointers. We could additionally take
>>> advantage of the fact that numbers tend to be small and people tend to
>>> use 4 to 8 byte numbers: we could arrange our address space to avoid
>>> ranges where false pointers are likely. But that's probably
>>> unnecessary.
>>>
>>>
>> We are not limited by what we currently have. Most of Rainer's work
>> involved surfacing more data to the GC.
>
> Sure. My point is that we don't *need* a lot of type information in the GC
> to get reasonable advantages. A little type information goes a long way,
> and a little caution in placing objects in the heap can help too.
>
>>> In D, thanks to unions, we can never create a fully precise collector,
>>> but we can create a moving collector. Objects pointed to from a union
>>> can't be moved, but few objects will be pointed to from a union.
>>
>> Actually, we probably could with modifications to the compiler.
>
> No, we couldn't. I've been assuming that we can make whatever
> modifications we want to the compiler for free.
>
> The compiler has static type information. This explicitly contains
> ambiguities. When it's based solely on data available at compile time,
> the compiler can resolve the ambiguities -- and you could have written
> the code without using a union.
>
> But in the majority of cases, people use a union because they must. That
> means which values are set depends on runtime data, and the compiler
> can't tell you which fields will be set.
>
> We could provide a runtime function that is invoked when you set a value
> inside a union. That won't work very well. Consider:
>
>    struct A { long a; long b; }
>    struct B { double a; void* p; }
>    union AB { A a; B b; }
>
>    extern void foo(ref A a);
>
>    AB ab;
>    ab.b.p = GC.malloc(512);
>    foo(ab.a);
>
> Was that pointer overwritten? No clue! The compiler can't tell. When
> compiling the definition of foo(), it didn't have any idea whether the
> struct was a union member. Now we have to intercept *every* indirect
> write to *any* value with a runtime call, even if you never use a union
> in your program --
>
> Oh look, now Python's beating us by an order of magnitude in benchmarks.
>
> What do we get for it? Well, the 1% of programs that use this pattern
> will allow the GC to move an additional 0.1% of their objects.
>
> Or we leave the GC conservative in this one case. We can detect this case
> pretty easily and treat apparent pointers as pinned. Everything will
> work, and the GC will be better at releasing memory than it currently is.
>

Ok, so yes, it needs conservative scanning in this case. Because even in 
C# I can pin things, so obviously it has valid use cases. I'm not 
absolutely against pinning and unions aren't going away, and if the GC 
would have to be go conservative then so be it.  And given that unions 
are a pretty niche use case, I don't think we should drop the idea of 
precision just because it can't be 100% precise, so we end up with a few 
conservatively-scanned edge cases, big deal, the benefits of precision 
to the whole ecosystem would be enormous in the general case.

Is this a debate about precise vs. non-precise GC or are we just 
bikeshedding about terminology and technical details? ;) I moved 
recently and lost track of the GC book for awhile so I've been 
refreshing my terminology. :P

-- 
// Adam Wilson
// quiet.dlang.dev


More information about the Digitalmars-d mailing list