[dmd-internals] Giving better static data limits to the GC

Thu Feb 2 03:50:03 PST 2012

Walter Bright, el  1 de febrero a las 13:37 me escribiste:
> On 8/23/2010 5:32 AM, Leandro Lucarella wrote:
> >That's why I think the pointer bitmasks generation should be added to
> >DMD, it doesn't affect the runtime, unless you explicitly use that
> >information, it doesn't mean you have to provide a precise scanning GC
> >by default.
> 
> 1. I think your idea of grouping static data that doesn't need to be
> scanned separately from static data that does is a good one. It
> means putting them into a different 'segment' in the object file,
> analogously to the way moduleinfo and eh tables are emitted into
> different segments.

Good to know that it's even possible. I remember having a huge impact in
some cases (where there is a big static data segment with words that
looked like pointers to chunks in the GC made garbage immortal).

> 2. Adding simple bitmasks to the typeinfo doesn't work that well - what to do about:
> 
>     struct S { ... }
>     struct T { S[1000] s; }
> 
> It would generate a giant bitmap.

Yes, I know that increasing the size of the executable is always bad,
but having a program eating up all your memory is way worse. Also this
can be made an option, so people caring about the size of the binary
which don't have problems with the GC don't have to pay that price.

Also maybe some repetition pattern can be done (just thinking out loud
now).

> 3. An idea (I think it was Andrei's) was for the TypeInfo for each
> type to have a virtual function that gets called by the GC to "scan
> this object". The advantage of this is it is completely flexible,
> and for many types custom marking code is going to be far faster
> than table driven, especially if that table is compressed.

What I don't like about this is part of the GC implementation is done by
the compiler. Having a function to scan the object might be flexible in
one way, but it would be completely inflexible in others (you remove
freedom to the GC implementation).

And supposing you go this way, how would be the signature of this
method? How would this pass information to the GC on how to follow
pointers? Maybe I don't understand it completely, but when I try to
think about the details I can't get my head around it that easily.

Also I don't know how this goes with moving collectors. How would you be
able to overwrite a pointer and how would you know when something must
be scanned but not overwritten (union { size_t x; int* p; })?

> 4. Eventually, I'd like to add to TypeInfo's an array of
> [offset,TypeInfo] pairs for the fields. While awesome in power,
> that's going to be slow for the GC to use, hence the idea (3) for
> that.

I think this would be awesome, even if slow, because it would allow the
GC implementation to have a lot of information to experiment. I.e. it
would be good for research.

Also the GC could pre-process that information and store it in an
internal format that's fast to process while scanning, so the only
overhead would be at allocation time, which I think is not that bad.

Maybe this should be the first thing done. This way you enable people to
use a GC that uses that information without needing to be the default
"official" GC. Maybe this is fast enough and you don't even need to
generate the virtual function to scan stuff. If you planned to do this
eventually, I really think it would be a good idea to do it sooner than
later.

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
You are the very reason why everything happens to you