Potential GSoC project - GC improvements

Sun Mar 13 12:43:37 PDT 2016

Chris Wright wrote:
> On Sat, 12 Mar 2016 13:23:35 -0800, Adam Wilson wrote:
>
> To start off, let's talk terminology. You seem to be using nonstandard
> terminology and possibly misunderstanding standard terminology.
>
> A GC scan is the mark phase of a mark/sweep collector (and specifically
> the part where the GC examines roots and allocated memory, if that
> matters). The GC searches for pointers to GC memory. Simple GCs only need
> to record whether there is a pointer to a given allocation, not where
> those pointers are coming from.
>
> A conservative GC is one that will encounter some things that aren't
> pointers but must be scanned as if they were pointers. In D, for
> instance, a union between a pointer and a size_t forces collectors to be
> conservative.
>
> A false pointer is something that a conservative GC must treat as a
> pointer and appears to point to an object allocated by that GC. For
> instance, unions in D allow us to create false pointers.
>
> A precise GC is a GC that never mistakes a non-pointer for a pointer.
>
> A moving GC is a GC that can relocate objects.
>
> A generational GC is one that can perform a collection on part of the
> heap without dealing with the full heap. Those parts tend to be organized
> by how long an object has survived. This does not require a precise or
> moving GC, though it works better with a precise, moving GC.
>

I didn't intend to confuse, but I was not writing to neophyte's, and I 
used linguistic shortcuts.

> A write barrier is a way to record which sections of memory have been
> altered since the last GC collection. This lets the GC scan only the
> changed data -- useful to any GC, especially useful to generational ones.
>
> A "partially moving" GC does not exist, as far as I know.
>

Yep, it's a Bad Idea.

>> Jeremy DeHaan wrote:
>>> On Saturday, 12 March 2016 at 08:50:06 UTC, Adam Wilson wrote:
>>>> If I may make a suggestion. The lock free work is unlikely to require
>>>> the entirety of GSoC. And the precise GC is the next most important
>>>> thing on your list and will have the biggest impact on GC performance.
>>>
>>> Rainer has two different precise GC's in pull requests right now and
>>> both are slower than the current one unless there are false pointers.
>>> I would expect anything I come up with to largely act the same. The
>>> reason I keep pushing for a generational garbage collector is because I
>>> think it would be where we would see the most benefit in terms of
>>> general performance.
>>>
>>>
>> I would contend that it's not quite that simple. Of course the precise
>> GC is slower, it's scanning more stuff.
>
> Precise GCs are slower because they must reference some data telling them
> which items on the stack and in allocations represent pointers and which
> do not. This data takes up cache space, and having less cache for memory
> being scanned slows things down. Correlating two pieces of data takes
> extra time, too.
>

Yep the data is usually held with the object in a header. Hence the 
"scanning more stuff".

> The hope with a precise GC is to avoid false pointers. A false pointer to
> an object that holds references to many other objects can be costly, both
> in GC time and in excess memory usage.
>
> In D, we have a tiny bit of type information surfaced to the GC to reduce
> the incidence of false pointers. We could additionally take advantage of
> the fact that numbers tend to be small and people tend to use 4 to 8 byte
> numbers: we could arrange our address space to avoid ranges where false
> pointers are likely. But that's probably unnecessary.
>

We are not limited by what we currently have. Most of Rainer's work 
involved surfacing more data to the GC.

>> The current conservative GC by
>> definition can't figure out large chunks of memory so it won't even
>> bother scanning them.
>
> It must scan anything that might be a pointer. A precise GC reduces the
> number of things that might be pointers but aren't to zero. Therefore a
> precise GC scans less.
>

Yep, precise GC's aren't totally perf-negative.

> If a conservative GC didn't scan something that might be a pointer, it
> would sometimes free an object that still had live references to it. This
> would result in memory corruption or segmentation faults.
>
>> What it does to those things it can't scan. And
>> what it can't scan it has to pin, which means the memory can't be moved.
>
> If a moving GC cannot precisely identify all pointers to an object, it is
> not allowed to move that object. If it can, it is allowed to move that
> object.
>
> A conservative GC can be a moving GC. It must classify things as
> "definitely pointer", "maybe pointer", and "not pointer". It cannot move
> an object with a "maybe pointer" pointing to it, but it can move an
> object with several "definitely pointers" pointing to it.
>
> In D, thanks to unions, we can never create a fully precise collector,
> but we can create a moving collector. Objects pointed to from a union
> can't be moved, but few objects will be pointed to from a union.
>

Actually, we probably could with modifications to the compiler. A little 
metadata goes a long way. And I know that Walter would accept that PR. I 
don't know for sure if it can be done, but I've seen precious little 
evidence one way or the other. Let's investigate before making absolute 
statements about what we can do. I know from discussions with Walter 
that the compiler's AST contains a wealth of information, and that more 
can be added.

>> You can't build a fully generational GC until you can move everything.
>
> You can build a more efficient generational GC if you can move objects.
> However, it's not impossible to build a non-moving generational GC.
> Instead of having large blocks of address space for each generation, you
> would have each Pool indicate its generation. Then you have to find the
> Pool for a pointer and check that flag.
>
> That's comparatively slow, but it'd still be faster to do a Gen 0 scan
> that way than to do a full scan with D's current collector (assuming you
> implement write barriers).
>

My apologies I didn't intend to state that it was impossible. I was 
attempting to emphatically question the wisdom of doing so, particular 
when the potential performance benefits are dubious at best. My point is 
that we should do this right, if we keep blindly pursuing performance 
without taking a considered approach to achieving it, we'll never get 
there. The GC will just become an ever larger pile of hacks that 
increase performance in specific cases.

>> They talk about this problem in the books. However, once you have a
>> precise GC, a fully moving GC becomes possible and only then do the
>> performance benefits of a generational GC become realized.
>
> The performance benefit of a moving and generational GC over a non-moving
> generational GC is that you can have a contiguous section of address
> space for each generation, allocated densely. You get this with less
> overhead.
>

That's actually pretty important. Precision enables moving, moving has 
clear and enumerable performance benefits. QED.

> However, a moving GC can be a pessimization for user code. If you
> allocate several objects in sequence, with many GCs you'll get memory
> allocated for them that is contiguous or nearly contiguous. For small
> objects, this is probably going to be on the same page of memory. But a
> moving GC is free to move them to different pages. So you need to
> exercise caution.
>

Yes ... and this is also dealt with in the books, including basic 
algorithms to help combat these problems. Additionally, as evidenced by 
the fact that every major GC (JVM, CLR, et al.) system implements a 
moving GC, I'd argue that the benefits greatly outweigh any potential 
risks. And we have pull requests to further optimize the algorithms. We 
don't have a system at all yet, let's build it first, then we can work 
on optimizing it.

>>> I have concerns about doing any compaction
>>> though, mostly because D can have both references and pointers to
>>> objects, and I don't know how we would want to go about updating
>>> pointers. Especially since pointers to D objects can exists in C and
>>> C++
>>> code.
>>>
>>>
>> Erm, a generational GC is, in practice, a moving GC as it must move the
>> blobs around from the various heaps.
>
> The Boehm collector is generational, conservative, and not moving. You
> probably conflate "moving" and "generational" because of the JVM.
>

Ok, yes, but lets be honest, it's what we actually want, and more 
importantly, expect. Precision makes it happen, efficiently. Even the 
C++ community doesn't take the Boehm GC very seriously. We have an 
opportunity to prove the C-style language can have a performant GC.

>> As for the C/C++ code problem various solutions have been proposed here.
>> For one, D knows the difference between the two, so pinning is possible
>> there.
>
> There's nothing in the type system that says whether you can pass a
> variable to a C/C++ function. So, no, D *doesn't* know the difference
> between them. And there's no GC.pin function, so people aren't currently
> telling your GC when something's accessible to C/C++ code.
>

D does differentiate between a GC reference and pointer. Pin the 
pointers. Yes, you can create get pointer from a reference, and doing so 
is a pinning operation. We'd have to add a GC.pin function, so long as 
the compiler emits it for us. :)

> That means a moving collector in D will produce segmentation faults and
> memory corruption in existing, currently working code.
>
>> AFIAK each OS allows you to specify multiple heaps
>
> You get one address space. You get to use mmap to ask for memory at a
> given address. There's no heap_t data structure to provide to malloc or
> mmap.
>
> You might have been confused by literature referring to multiple heaps.
> That's just a single allocator deciding to divide its address space
> logically. There's no OS-level enforcement.
>

Windows has that concept via HeapCreate shown here: 
https://msdn.microsoft.com/en-us/library/windows/desktop/aa366599(v=vs.85).aspx
I'd be surprised if it's not available on other OS's, and yes, it is OS 
enforced. It might not work exactly how you like it, but that's a far 
cry from "not provided".

>> A concurrent GC without precision is mostly useless
>
> The Boehm collector is not precise. It is concurrent. Its concurrency
> reduces the amount of time a GC collection takes. This is mostly
> orthogonal to the benefits of a precise collector in reducing the amount
> of memory a process uses.
>
>> because the GC roots
>> are primarily derived from the stack which the thread was spawned and
>> the stack roots are not scanned in a conservative GC.
>
> They *are* scanned in a conservative GC. If you don't scan memory that
> might contain pointers (such as the stack), your GC will inevitably cause
> segmentation faults and memory corruption. D's GC is conservative and
> does not cause segmentation faults or memory corruption, therefore it
> must scan the stack.
>

I may have been confusing any precise GC with Rainer's precise GC which 
only did precise Heap scans. Apologies for the confusion.

I will admit that I've done better work with my wording than what is in 
my last message. However, my fundamental point is sound, precision has 
benefits across all GC workloads and enables scenarios beyond the GC 
itself. Extending the compiler can work around many of the issues with 
false pointers.

And let's engage the elephant in the room. One of Rainer's main 
learning's was the precision permeates the compiler and GC, you have to 
deal with it everywhere. So you end up rebuilding the GC from the ground 
up when you add it. What you are proposing is building a 
conservative-generational GC that will have to be jettisoned and 
rewritten *again* once precision is added for a performance boost that, 
by your own admission, may not actually happen. How is this a good use 
of time?

My statement stands. Precision is the basis from which all high-order GC 
functions flow. Yes, you can work around not having it, but such 
workarounds tend to be just that. More importantly you are inevitably 
going to end up doing a half-implementation of precision anyways because 
each of the high-order functions need parts of that information.

I know precision isn't fun. But it's extremely important work that needs 
to get done. And once we have it, the world of GC algorithms opens 
before us. Until then we are intentionally limiting ourselves. One of 
the biggest complaints about D is the GC, fine, so instead of doing yet 
another hack-job on something that may not actually improve GC 
performance, let's do it right and lay the groundwork for a modern GC 
that allows us to really compete with JVM/CLR.

-- 
// Adam Wilson
// quiet.dlang.dev