Array Slices and Interior Pointers

Rainer Schuetze r.sagitario at gmx.de
Tue Dec 11 12:24:56 PST 2012



On 11.12.2012 18:25, Alex Rønne Petersen wrote:
> On 11-12-2012 08:29, Rainer Schuetze wrote:
>>
>> On 11.12.2012 01:04, Alex Rønne Petersen wrote:
>>> http://xtzgzorex.wordpress.com/2012/12/11/array-slices-and-interior-pointers/
>>>
>>
>>  > This is clearly a huge problem for type-precise garbage collection.
>>
>> I don't see problems here. If a memory block is referenced, all of it
>> contents remains in memory, so they are scanned with their full type
>> info. Or do you want to chop off unreferenced parts of the memory block?
>
> No, the problem I was getting at is:
>
> Suppose we have a field int* p; somewhere in the GC heap. With the
> current state of affairs, we have to consider that this field can hold a
> value that is either:
>
> a) null (we don't care)
> b) a pointer into C memory (we don't care)
> c) a base pointer into the GC heap (unlikely but possible if "new int"
> was used somewhere)
> d) an interior pointer into the GC heap (much more likely; a pointer to
> a field of another object)
>
> So we have to look at the pointer and first figure out what kind of
> memory block it is /actually/ pointing to before we have any kind of
> type info available (just the knowledge that it's of type int* is not
> particularly useful by itself other than knowing that it could be a
> pointer at all).

At least for the D GC, the major work is to figure out if the pointer is 
pointing to GC memory or not. Once that is done (i.e. a pool of 
contiguous memory is found that contains the addressed memory) it's just 
a table lookup for the size and corresponding address alignment to get 
the base of the referenced GC memory block.

>
> With my scheme, the possibilities would be:
>
> a) null (we don't care)
> b) a pointer into C memory (we don't care)
> c) a base pointer into the GC heap where the memory block is of type int*
>
> Notice how we did not have to do any significant work to figure out what
> we're dealing with; we immediately know what kind of typed memory the
> pointer is pointing to.

This stores the type info with the reference, not with the memory block, 
but it does not make a big difference. (Actually it does: if the 
reference only is a reference a base class of the actual instance, type 
info is lost.)

>
> This becomes more of an advantage with aggregates. Suppose we have:
>
> struct A
> {
>      // ... more fields ...
> }
>
> And we have a field A* p; somewhere in the GC heap. We can now look at
> it and immediately tell whether it's a case of a, b, or c above and can
> trivially continue scanning into the pointed-to memory (if needed).
>
> So the TL;DR is: We avoid extra work to figure out the actual type of
> the memory something is pointing to by simply making such cases illegal.
>
> Whether that is practical, I do not know, and I don't plan to push for
> it anytime soon at least. But it has to be done for D to ever run on the
> CLI.

I understand that the CLI forbids interior pointers, but that seems an 
implementation detail of its GC.

>
>>
>>  From your post, it seems these are restrictions imposed by the .NET GC,
>> not by slices in general. If you take a pointer to a field inside a
>> struct, you will again get interior pointer. Do you want "fat pointers"
>> for this as well?
>
> Sure, there's nothing wrong with slices if we assume all GCs that'll be
> running in a D implementation support interior pointers. But if we make
> this assumption, D can never run on the CLI.
>
> Interior pointers are OK in the stack and registers, so taking pointers
> to fields inside aggregates should be fine so long as they are not
> stored in the heap.
>

I don't think we should introduce pretty strange semantics that 
introduce different kind of pointers and targets depending on whether 
they live on the heap or the stack.

The best that could be done for a .NET target build would be to let the 
compiler create fat pointers that always store the base of the memory 
block and an offset, not just for slices.

BTW I was also thinking whether "instrumented" pointers should be used 
to support a GC that works without "stopping the world". E.g. they would 
allow to keep track of references to each memory block continuously, or 
to remember which references were changed since the last scan in the 
hope to do incremental/generational scans.


More information about the Digitalmars-d mailing list