Is a moving GC really needed?
Don Clugston
dac at nospam.com.au
Fri Oct 6 03:30:03 PDT 2006
Chad J > wrote:
> xs0 wrote:
>> Chad J > wrote:
>>
>>> xs0 wrote:
>>>
>>>>
>>>> While I'm no expert, I doubt a moving GC is even possible in a
>>>> systems language like D.
>>>>
>>>> First, if you move things around, you obviously need to be precise
>>>> when updating pointers, lest all hell breaks loose. But how do you
>>>> update
>>>>
>>>> union {
>>>> int a;
>>>> void* b;
>>>> }
>>>>
>>>> ? While you could forbid overlap of pointers and non-pointer data,
>>>> what about custom allocators, assembler, C libraries (including OS
>>>> runtime!), etc.?
>>>
>>>
>>> For the union, I might suggest a more acceptable tradeoff - mandate
>>> that some data be inserted before every union to tell the gc which
>>> member is selected at any moment during program execution. Whenever
>>> an assignment is done to the union, code is inserted to update the
>>> union's status. So your union would look more like this:
>>>
>>> enum StructStatus
>>> {
>>> a,
>>> b,
>>> }
>>>
>>> struct
>>> {
>>> StructStatus status; //or size_t, whichever works
>>>
>>> union
>>> {
>>> int a;
>>> void* b;
>>> }
>>> }
>>
>>
>> Well, you can't simply mandate that, I think.. It can be the case that
>> the exact structure of a struct/union is mandated by requirements
>> external to the application..
>>
>
> I think you can though, the same way that all D classes are prepended by
> a pointer to a vtable and a 'monitor', or the same way that dynamic
> arrays have not just a pointer, but an added length variable.
>
>> And, even if it was an option, it doesn't really solve the precision
>> issue. To be precise, the GC must know the internal structure of
>> anything on its heap, which I don't think is possible unless the
>> language was severely restricted or it was required that the
>> programmer supplies that information manually whenever the purpose of
>> a part of memory changes from pointer to non-pointer and vice-versa.
>> Both options suck quite a lot.
>>
>
> But it can be precise with said union. It will know at compile time
> where all of the unions are stored, in the same way it knows where
> pointers are stored. Then, when it looks at the unions, it determines
> at runtime whether or not they contain pointers by using the extra
> member info. Either the member info says the current member is a
> pointer, or it says that the current member isn't a pointer. There is
> no maybe.
>
> IMO this is not a severe restriction, and it does not require any
> programmer intervention. The compiler writes all of the code that sets
> unions' member info variables, and to the programmer the member info is
> readonly and needs no maintenance.
>
>>
>>> Not sure how custom allocators mess up the GC, I thought these were
>>> just on their own anyways. If a pointer to something is outside of
>>> the GC heap, the GC doesn't bother changing it or collecting it or
>>> moving anything.
>>
>>
>> That's true, but a custom allocator can use the GC heap if it wants.
>> For example, if I know I'll be allocating a million linked list nodes,
>> and will then release them all at once, it can be considerably faster
>> to allocate a large byte[] and use chunks of it for my nodes. But how
>> can the GC tell which of those bytes are pointers and which aren't? At
>> the beginning none are, but later some may be..
>>
>
> Hmmm. In the case of the allocator I think it is the allocator writer's
> responsibility to use gc.addRange() on all of the pointers that they
> toss into the byte[]. What bugs me now though, is other arbitrary data
> cases, like std.boxer.
>
> So far the only answer I can think of is to make people dealing with
> arbitrarily typed data be very careful about gc.addRange'ing and
> gc.removeRange'ing any references that goes into or out of their
> arbitrarily typed data, which does suck.
>
>>
>>> Assembler is a bit tricky, maybe someone smarter than I can handle it
>>> better, but here's a shot with some psuedoasm:
>>>
>>> struct Foo
>>> {
>>> int member1;
>>> int member2;
>>> }
>>> Foo bar;
>>>
>>> ...
>>>
>>> Foo* foo = &bar;
>>> int extracted;
>>> // foo spotted in the assembly block, never mind the context
>>> // as such, foo gets pinned.
>>> asm
>>> {
>>> mov EAX, foo; // EAX = foo;
>>> add EAX, 4; // EAX += 4;
>>> mov extracted, [EAX]; // extracted = *EAX; or extracted = foo.member2;
>>> }
>>> // foo is unpinned here
>>
>>
>> I'm sure simple cases can be detected, but am also fairly certain it's
>> impossible to generally determine which registers hold pointers and
>> which don't, and/or what will get referenced and what won't.
>>
>
> Yeah, I realized stuff like ASM that rewrites/rearranges the stack would
> mess up the GC. Perhaps we just need a comprehensive list of do's,
> dont's, and workarounds in D asm programming?
>
>>
>>> As for C libraries, it seems like the same thing as custom
>>> allocators. The C heap is outside of the GC's jurisdiction and won't
>>> be moved or manipulated in any way. C code that handles D objects
>>> will have to be careful, and the callee D code will have to pin the
>>> objects before the go out into the unknown.
>>
>>
>> I meant precision again, not C's heap. Even if compiled D code somehow
>> notifies the GC what's on the stack, the C code definitely won't.
>> Then, how can the GC tell which parts of the stack point to its own
>> heap, and which are just integers that happen to look like they could
>> point to the heap?
>>
>
> It already knows where pointers are in each frame of the stack for D
> code, can't we make it also know which frames aren't D code at all? For
> the C code frames it doesn't look at them at all. Any D code that could
> let references out into C code (only way I know of is with extern(C))
> should pin that object on the D side and keep a reference until
> execution has returned to D.
>
>>
>>> Also, I wonder, if I were to make a tool that does escape analysis on
>>> your program, then finds that a number of classes can either be stack
>>> allocated or safely deleted after they reach a certain point in the
>>> code, then would this change the effectiveness of a generational GC?
>>> Perhaps part of why young objects die so often is because they are
>>> temporary things that can often be safely deleted at the end of scope
>>> or some such.
>>
>>
>> Well, I think it's quite hard to guess whether the result would be
>> better or worse, speed-wise. Obviously, allocating those objects on
>> the stack would speed things up, while using generations in the GC
>> would be less effective, because the GC would see less short-lived
>> objects. I've no idea, however, how those two effects would combine..
>> Most probably it depends very much on the actual application (as do
>> most things ;)
>>
>>
>> xs0
Why do you need a 100% moving GC anyway? Wouldn't it possible to combine
moving GC with mark-and-sweep for those cases (like unions and asm)
where you just can't be sure? In other words, a 'conservative moving
gc'. (Might be a nightmare to implement, of course).
More information about the Digitalmars-d
mailing list