(Semi) precise GC [was: Re: Std Phobos 2 and logging library?]
Fawzi Mohamed
fmohamed at mac.com
Wed Apr 15 05:57:28 PDT 2009
On 2009-04-13 20:33:53 +0200, Frits van Bommel
<fvbommel at REMwOVExCAPSs.nl> said:
> Leandro Lucarella wrote:
>> Frits van Bommel, el 13 de abril a las 19:36 me escribiste:
>>> Leandro Lucarella wrote:
>>>> Frits van Bommel, el 13 de abril a las 13:30 me escribiste:
>>>>>>> Or you can pin anything that's referenced from the stack, and move
>>>>>>> anything that is only referenced from the heap.
>>>>>> That's more likely to happen, but it requires a compiler change too
>>>>>> (provide type information on allocation). Maybe I wasn't too clear,
>>>>>> I didn't mean to say that a moving collector is impossible, what is
>>>>>> impossible is to make allocation a "pointer bump".
>>>>> The compiler already passes a TypeInfo on allocations IIRC. And
>>>>> TypeInfo can produce a TypeInfo[], it just happens that DMD and GDC
>>>>> don't fill it in for user-defined aggregates, and LDC needs a
>>>>> compile-time #define to enable it (because it breaks linking the Tango
>>>>> runtime, IIRC).
>>>>> (For other types, this fact it returns null is a simple library issue)
>>>> Well, this is nice to know (even when it's not used yet, it's better than
>>>> nothing). And how can the GC obtain this kind of information?
>>> Well, since the allocation routines should all get a TypeInfo reference
>>> from the compiler, the GC can store the typeinfo for each memory block
>>> somewhere, and later use it. It can then call ti->offTi() which should
>>> return an array of OffsetTypeInfo structs (see object.d[i]). The only
>>> caveat is that those array return values should be statically allocated;
>>
>> But right now gc_malloc() doesn't take any TypeInfo argument. I can't see
>> where I can get the TypeInfo in the first place =/
>
> Ah, you're right. But if you'll look at your nearest lifetime.d[1]
> you'll see that all the allocation routines called by the compiler *do*
> provide a TypeInfo, so apparently it's just not propagated to gc_*. So
> I guess the first thing to do would be to either
> (a) change the signature of gc_{malloc,calloc,extend}()
> or
> (b) add something like gc_settype(void*, TypeInfo)...
>
>
> [1]: Tango name, and presumably druntime as well; I think it's spread
> all over the place for Phobos 1.
>
>>>>> I have no idea how efficient this would be, however. My guess would be
>>>>> not very.
>>>> I'm not concerned about efficiency, I'm more concerned in non-trivial
>>>> compiler changes.
>>> Well, efficiency is important too.
>>
>> Sure, and it's really hard to assume how efficient that could it be (you
>> loose some efficiency in some cases but you probably gain a lot in other
>> cases if most allocations are a pointer bump). What I meant is that I can
>> test efficiency, to see if this is really viable or not, but it's very
>> hard for me to change the compiler (and it's much harder that those
>> changes would be accepted in "upstream", and one of my thesis goals is to
>> make something useful, that can be easily adopted, not just an academic
>> curiosity =).
>
> Well, if it turns out to be a win, I'm sure we could put it into LDC.
> DMD would be up to Walter.
and tango will also for sure welcome a new gc implementation.
Most of the issues, and how to modify to get the that were already
discussed. Personally I like a blocked approach (i.e. flag+size), more
than a full bitmap, in the future one can think of compiler clustering
pointer types,... together to reduce the number of blocks. Subclassing
means that you will always have some blocks, but it is still probably
better than the bitmap, I don't like that at the moment typeinfo takes
up so much space (at least the size of the type).
To get all the info offTi aside (which are correct only on LDC as far
as I know) tango.core.RuntimeTraits could be useful.
add support for weak pointers (that at the moment are normally stored
as non pointers), fvbommel had a place for them in its enum values
at the moment the values in the registers are dumped, but not read
back, either you change that, or all those values should be pinned
(just as all union/maibe pointer)
tango io uses void[] arrays to take advantage of the auto cast, but
these are not pointers (and the gc knows this because at the moment the
flag used for an array are the one used to allocate it the first time.
during the collection you need to stop the threads (at least in the
moving gc algorithms, and in the current mark an sweep).
While the threads are stopped you have very stringent constraints,
basically the same constraints as for a signal handler.
You cannot call any non signal safe function, not even acquire posix locks.
So try to do the least possible in that phase, and be very careful.
More information about the Digitalmars-d
mailing list