(Semi) precise GC [was: Re: Std Phobos 2 and logging library?]

Mon Apr 13 10:36:18 PDT 2009

Leandro Lucarella wrote:
> Frits van Bommel, el 13 de abril a las 13:30 me escribiste:
>>>> Or you can pin anything that's referenced from the stack, and move
>>>> anything that is only referenced from the heap.
>>> That's more likely to happen, but it requires a compiler change too
>>> (provide type information on allocation). Maybe I wasn't too clear,
>>> I didn't mean to say that a moving collector is impossible, what is
>>> impossible is to make allocation a "pointer bump".
>> The compiler already passes a TypeInfo on allocations IIRC. And TypeInfo can 
>> produce a TypeInfo[], it just happens that DMD and GDC don't fill it in for 
>> user-defined aggregates, and LDC needs a compile-time #define to enable it 
>> (because it breaks linking the Tango runtime, IIRC).
>> (For other types, this fact it returns null is a simple library issue)
> 
> Well, this is nice to know (even when it's not used yet, it's better than
> nothing). And how can the GC obtain this kind of information?

Well, since the allocation routines should all get a TypeInfo reference from the 
compiler, the GC can store the typeinfo for each memory block somewhere, and 
later use it. It can then call ti->offTi() which should return an array of 
OffsetTypeInfo structs (see object.d[i]). The only caveat is that those array 
return values should be statically allocated; the GC probably won't like an 
allocation happening during collections...

>>> pointed by that type of fields should not be moved, ever. So, even after
>>> a fresh collection, your heap can be still fragmented. You have to store
>>> information about the "holes" and take care of them. This can be very
>>> light too (in comparison with the actual allocation algorithm), but it can
>>> never be as simple as a "pointer bump" (as requested by David =).
>> Well, it may technically be possible to move a heap object right before
>> assignment to a union/void[] or passing to C if the compiler calls
>> a library function before doing something like that.
> 
> Yes, I guess it's technically possible, but again, it needs (AFAIK)
> non-trivial compiler changes.

Well, the change to the compiler might not be that big. Detection of unions, 
void[]s and C calls should be pretty simple. The lib routine might be "a bit" 
more complicated...

Though this all assumes the compiler first provides enough information about the 
stack & registers for a moving collector to be feasible -- which would probably 
be a much bigger task.

>> Then pinned objects could be allocated on a separate part of the heap
>> that never gets moved (unless no more references in untyped memory are
>> live, maybe?) and allocations could still be a pointer bump in the
>> movable part of the heap.
> 
> Sure. And what about the non-movable part of the heap ;)
> You still have to manage that, you can't simply ignore it. That's what
> I meant with this:
> 
>>> So technically, you'll always have to deal with memory fragmentation in
>>> D (I don't think anyone wants to drop unions and void[] =), and it's true
>>> that it can be minimized to almost nothing. But since it's technically
>>> possible, you can never get away from the extra complexity for managing
>>> those rare cases.

Well yeah, you'll still have a non-movable part. Hopefully it'll be much smaller 
than the movable part though. And like I said, allocations can still be pointer 
bumps -- it's the assignments to unions, void[]s and C calls that suffer...

> [...]
> 
>> I have no idea how efficient this would be, however. My guess would be
>> not very.
> 
> I'm not concerned about efficiency, I'm more concerned in non-trivial
> compiler changes.

Well, efficiency is important too. This has the potential to trigger what is 
effectively a marking of the entire heap (to find all references to an object 
that needs to be moved *now*) much more often than would otherwise happen. Like 
I said, my guess would be this isn't very efficient.

> Anyway I think the important thing here is to at least get a precise heap
> (I would be nice if one could provide type information for the root set
> too, I guess).
> 
> For me there's almost no difference between having a non-precise stack and
> unions/voids[] or having just non-precise unions/voids[]. You have to
> support non-movable objects anyway, and I guess the stack is small enough
> to be a non-problem in practice. I think the cost/benefits ratio of having
> a precise stack doesn't worth the trouble.

A precise heap would certainly be a nice starting point, but adding precise 
stack and registers might be a nice improvement over it. Especially for things 
like the Tango allocating large stack buffers to avoid heap allocs. They're 
pointerless, but the GC doesn't know that...

IIRC there have been some talks on the LLVM mailing list about how to emit stack 
and register maps, so at some point in the future LDC might actually support all 
that...