RCArray is unsafe

Mon Mar 2 23:17:09 PST 2015

On Sunday, 1 March 2015 at 15:44:49 UTC, Marc Schütz wrote:
> Walter posted an example implementation of a reference counted 
> array [1], that utilizes the features introduced in DIP25 [2]. 
> Then, in the threads about reference counted objects, several 
> people posted examples [3, 4] that broke the suggested 
> optimization of eliding `opAddRef()`/`opRelease()` calls in 
> certain situations.
>
> A weakness of the same kind affects DIP25, too. The core of the 
> problem is borrowing (ref return as in DIP25), combined with 
> manual (albeit hidden) memory management. An example to 
> illustrate:
>

1 >     struct T {
2 >         void doSomething();
3 >     }
4 >     struct S {
5 >         RCArray!T array;
7 >     }
8 >     void main() {
9 >         auto s = S(RCArray!T([T()])); // s.array's refcount is
10> now 1
11>         foo(s, s.array[0]);           // pass by ref

         ++  s.array[0] = myt;             // Would also be invalid
12>     }
13>     void foo(ref S s, ref T T) {
14>         s.array = RCArray!T([]);      // drop the old s.array
15>         t.doSomething();              // oops, t is gone
16>     }

1. Assignment to RC types is not the same as assignments to 
non-RC types.
2. Allocation of RC types is more than just allocating of non-RC 
types.

Using the above example:

#9: The assignment and allocation does two things:
    a. Decrements the current RC of s.array(here it is null so no 
RC is used)
    b. Increments the ref count of the allocated RCArray by one.

#11: we pass both s and a referent to inside S. This is odd 
behavior and not technically required in this case(just pass S). 
Of course, we can't expect such behavior to creep up in complex 
code.

#14. In this case the behavior is correct. The assignment does 
the following:
    a. Decrements the current RC of s.array(here was 1, so now it 
is 0)
    b. Increments the current RC of the newly allocated RC array 
and assigns it to s.array.

#15. Since T refers to the old s.array, which now has a ref count 
of 0(which we can assume to then be unallocated), line 15 becomes 
a run-time error.

---

How to fix?

It seems that the only way to make it work well is to use sort of 
"lazy" ref counting.

Here references are created at the start of functions and 
decremented at the end of functions... rather that in place(which 
then causes problems with things that happen after it).

But this can't work as in the ++ I added. Here it would solve 
your problem but not fix the a generalization of it.

---

Is there a solution? Essentially no. Only at the end of the 
program could we possibly have the last function release all RC 
counted arrays. Since it is the end of the program it is the only 
time we can know that no more RC types will be used.

The problem is analogous to the multiple-inheritance problem.

---

Flow analysis can only help so far(it won't help with dynamically 
allocated types that are allocated depending on some "random" 
factor.

---

Note that since t is pointing to a part of S, technically we 
can't release s until t is released.

So, effectively, S has to be ref counted too! And it's ref count 
must check check T's ref count. If it is non-zero then the it 
can't release itself(or the reference to it's array).

Also, something eventually has to release S(or which is then 
s.array). When? Well, since t depends on S, it is when t is 
released. So we would naturally have to inform t that it should 
too release s when it is released.

----

So, if that sort of makes sense I'll explain the solution: It 
might be a bit fuzzy but I'll try to make things very clear. [I 
will also sometimes conflate the usage of concrete objects and 
their corresponding abstract types. e.g., "T with t". It should 
be clear context which is meant. e.g., If I say "T is released", 
I mean that the object with type T was released. When they need 
to be distinguished between I will do so(mainly in the code 
examples]

A *reference counted* type, from here on designated as RCT, is a 
type that only allows itself to be released to the heap when 
nothing depends on it. [Obviously if anything depends on it when 
after it is released will fail to be fulfilled]

If a RCT does not contains any references to any RCT's we will 
call it a free RCT or FRCT. Such types behave in a simple way. 
They can be free'ed completely immediately. Since FRCT's do not 
contain any references to other RCT's all their references are 
simple references that can be free'ed immediately.

We require that if a type contains a reference to a RCT then it 
too is a RCT. [the relation ContainsRCT(T1,T2) ==> 
ContainsRCT(T2,T1) ==> Both T1 and T2 are either RCT's or not 
RCT's] This isn't a big deal but without it we end up requiring 
all types to be RCT's. Here we allow some types to be not be 
RCT's.

In code,

class T { }   // a standard non-reference counted type.
               // It can't use any RCT's as references. It is your 
basic D type of class.

rcclass FRCT  // rcclass means reference counted class. It is 
identical to your basic
               // D type of class but has different keyword 
designator(which is why we call them free).
{
   int x;
   float[] y
   // etc, FRCT does not point to any RCT's
}

rcclass X { }  // Just another RCT

rcclass RCT
{
     FRCT myFRCT;
     int y;
     RCT foo(ref barRCT);
     X x;
     // etc Basically this is a class that can include other RCT's
}

----

Ultimately we could simply use the basic D class but we would 
have to designate every type as an RCT type... even the basic 
types. I'm trying to avoid some basic issues of that method that 
would derail the discussion. [But for those that want to know: 
one could simply treat all objects in d as rcclass types describe 
above, even basic types and such, but optimize out the reference 
counting part in in the compiler for those basic and FRCT 
types... this then would simply lead back to the method I'm using 
to describe it but simply some of the details were hidden in the 
"simple method" by the compiler]

So, once you have such a type system. The way to deal with the 
complex relationships that exist are to know them:

1. A RCT have the following relationships between it's containing 
types. We will denote T to be a non-RCT type, FRCT to be an FRCT, 
and RCT to be an RCT.

      RCT can contain T's, FRCT's, and/or, RCT's.

      In Code:

      rcclass RCT
      {
          T t;
          FRCT myFRCT;
          RCT myRCT;
      }

In this case, which is the only case we will talk about w.l.o.g), 
when an RCT is released it must attempt to release each of it's 
references... in example above: t, myFRCT, and myRCT. Since t is 
a simple type, it can be released immediately. No RCT's point to 
it because if they did, then T would be an RCT and it isn't. 
Similarly myFRCT can easily be released because it is just a 
complex of things like T's. In this case though there is 
something slightly different about it which is the referencing 
counting part. Essentially it's contained types are simple types 
but it still has a reference counting aspect for itself.

Basically T's and FRCT's can free themselves when they need to. 
This is because they know how to free each reference they contain 
and it can't cause referential problems simply because they do 
not contain such references. This by no means they won't cause 
other parts of code to access invalid references. A pointer to 
such a member of a class can easily try and access the member 
after the object has been freed. But this is not our concern 
because we are only dealing with RCT's. Our goal isn't to have 
everything be RCT's. If that was the case then the T and FRCT 
cases would not be needed since we won't use them. e.g., We might 
want the GC to handle non-RCT and some other method for FRCT's. 
This allows one to build on current D system. But one must 
remember that RCT's only guarantee valid referencing when they 
access other RCT's.

The last sentence above directs where we are going with this: 
Since we have RCT we can make sure they know how to communicate 
with each other. (we are creating a metatype here)

By having the proper communication system here it is possible for 
the types to know when they need to be released.
In fact, it is not hard and several methods can be used, even 
simultaneously.

Axiom: Every RCT has the ability to have any other RCT that 
contains a reference to it, to be able to release it. Also, vice 
versa!

[This requirement essentially allows the very final reference to 
know that it needs to release other dependencies it had and 
allows it to not only clean up itself but everything else it 
needs to clean up]

Once you have the above requirement of RCT's(The implement dealt 
with at a different time), you can lazily release objects in the 
background without issue(fragmentation of the heap is a harder 
matter but still has a solution because of the above requirement).

How do they communicate:

Your example will do:

     rcstruct T
     {
         void doSomething();
     }

     rcstruct S
     {
         T[] array;
     }

     void main()
     {
         auto s = S(new T());  // Here when the compiler creates 
the array of T's S knows T is a RFC.
                               // In this case it can get at the 
hidden communications layer of T and
                               // inform T that it is using it. 
Hence at this point
                               // Both S and T can communicate 
with each other! (not just blind inc/dec of an int
         auto t = s.array[0];

         foo(s, s.array[0]);   // nothing special except foo much 
create an added reference to the two argments
                               // This can be optimized out as 
long as foo returns once for every time it
                               // is called. In this case, At the 
start of foo, the counters will be inc'ed
                               // and at the end they will be 
dec'ed. Always a zero sum outside the function call.
         t.doSomething();      // Skip to foo, come back when done 
with foo! Great, t is still around!
                               // (RC is still zero at this point!)
     }

     void foo(ref S s, ref T t)
     {
         s.array = new T[];    // Here we are reassigning the 
array. No problem! We simply inc the ref count on the new array 
and decment it on the old. Just like normal ARC'ing! The only 
different is that t is not immediately release! Even though it 
has a RC count of 0. Think of it as lazy ARC'ing... or LARC'ing! 
;)
         t.doSomething();      // Great, even though t has an RC 
count of zero, it hasn't been released yet!
     }

So, what is this LARCING?

Essentially instead of s simply dumping t in foo, it informs t 
that it is no longer using it. This sets the RC = 0 !!BUT!! t 
does not free itself immediately! If it does, it leads to the 
problem you have given and also to the extended problem I have 
shown above(calling t.doSomething in main).

If t sticks around though, we are not releasing memory and hence 
what is the point?

Well, as usual this is an engineering problem. We have a few 
cases.

1. The compiler can prove that a RCT goes out of scope and is not 
used any more. In this case t can release itself at the end of 
the function where it is provable that it is no longer used. (In 
your example it is at the end of foo ad in my example it is at 
the end of main) Flow analysis, simple brute force checking, etc 
are used here. Most cases in program will fall under this 
category. Local variables that are not referenced to by global 
variables are proven to be releasable at the end of the function 
call. Hence, many cases will work as simple ARC'ing and not 
require LARC'ing.

2. The compiler cannot prove when a variable is release. This is 
when the type itself has to decide when it is not being used. 
(this the last part of the problem and your example falls under 
it)

In this case, the communication between S and T is crucial. When 
s decides it no longer need t, it informs t. t has a list of 
every reference that is using it(this can just be a pointer to 
the ref
count of the RCT). Having this list, while potentially weighty, 
lets t know when all the references to it are not needed any 
more(sort of chains or links between any other RCT and t have 
been broken).

Once this is done it is a simple matter of t trying to release 
itself every time it goes out of scope.

      a. At some point t will release itself
      b. t will never be able to release itself(this is generally 
due to program error in which a reference was made to t without 
removing the reference when it was done). Note this is why if 
every type is an RCT, prevents dangling(e.g., we would have full 
LARC'ing... or FLARC'ing.). FLARC'ing may not be optimal 
though(all pointers would have to be RCT which would probably be 
quiet inefficient. Also interdependencies are handled easily:

          s.t = t;
		 t.s = s;
          while (!isFree(t) && !isFree(s))
          {
		     s.t = null;
			 t.s = null;
              if (t.refCount == 0)
                 s.release();
              if (s.refCount == 0)
                 t.release();

          }

          The reason that t and s are released in LARC'ing is 
because at the 2nd time through the loop s.release() knows that t 
no longer depend and can release itself immediately(because t.s = 
null set s's *t entry* to 0 on it so now s knows that t doesn't 
depend on it). This is just a prototype that can be used to show 
that any level of nesting between types will eventually resolve 
all releases properly.

      c. Alternatively, a GC like memory manager can scan through 
the types and release all types who's reference count = 0 and 
who's dependencies are all invalid(nothing depends on it). It can 
do this in the background without issue and can compact 
consecutively release objects without issue. i.e., the GC knows 
what part of the heap belonged to objects there were ref counted 
and if it is proven they have any references then it can mark 
them as free'ed memory. Consecutive free'ed objects in memory can 
be combined since they never will be used again. This can be done 
behind in parallel and is sort of minor defragmentation. In fact, 
analogous to defragmenting a drive. If you know a file is not 
being used then you can defragment it in the background and move 
the various parts around without issue because you know it is not 
being used. Once it is being uses, you run into issues with 
having to stop the threads that are using that part of the heap. 
(I think this limitation could ultimately be removed for most 
practical cases.)

	 In this case the GC would have to have a list of all objects of 
RCT's. Essentially the GC is given to the task of LARC'ing in the 
background.

Finally, which such an structural approach(Essentially 
implementing the reference list, which is a random access list 
containing all the references(pointers to all classes that use 
the type))  to ref counting becomes pretty stable and in probably 
99%+ of program usage, statically proven to release memory(step a 
above, The compiler inserts many static releases that are 
released immediately and many types will be simple types that 
don't need LARC'ing).

The two issues with this, as always, are speed and memory usage. 
I think it should rather fast(most releases are direct and just 
release themselves since they don't have references. On top of 
that the remaining are mostly FRCT's which can be released nearly 
immediately(they can, but don't until they are informed it is ok).

For the full blown RCT types, the way they release things is in a 
hierarchical manner which can create avalanches of changes. If 
many objects are waiting for one object to be released, then when 
it does, they all could be released immediately. But this chain 
reaction is simply lazily recursive and eventually terminates.

The only real problem is how to deal with un-managed references 
to RCT types. These are references(say pointers) to RCT's that 
may not be "smart"(inform the RFC that it is no longer being 
refered to. A simple pointer to a class that doesn't do any 
communication is the prototypical problem.

The only solution seems to be to just accept it. It is a 
necessary evil. Allow it but require the program to know the 
intricate details. Such pointers are needed for speed and don't 
inherently cause any problems. It's only when they are abused can 
issues creep in.

If one, say, had RCT pointers that are used for some types of 
problems and simple pointers when they are needed, it becomes 
pretty marginal of an issue. (if you are doing driver programming 
and have to have speed and direct access then use the simple 
pointers but it is your just to know if your references are 
working as expected.

In many cases, RCT pointers can be used in many of the cases 
which make further minimize the singular problem case(that of 
non-informative referencing). (We could call such types "Peeping 
Toms"... but as long as they only peek it is ok(not by law, in 
programming ;)).

At this, imagine that such a system was implemented. It would 
provide almost complete automatic memory management coverage with 
releasing resources as they are generally released in the program 
flow. In fact, with a GC(don't you just hate it when your garbage 
man doesn't pick up your trash sometimes? specially if he never 
picks it up) that essentially works parallel, most of the actual 
releasing can be done in the background without ever stopping the 
program. The benefits are: Better real-time behavior, Some 
programs might never need to compact the heap(in fact, if 
intelligent heap allocation is used, probably most programs could 
benefit from it), and debugging is much easier(Essentially more 
like manual management for most objects... just the compiler 
handles for us, but we know when, most likely, an object goes out 
of scope it will be released). Seg faults due to invalid memory 
access will almost be reduced to 0(since the corner case... the 
problem case of simple references is not used as much as all the 
other ways to use objects).

The downside? There really isn't one except memory usage! One can 
come up with special cases that simply don't work well under 
LARC'ing but because it allows one to use non-LARC'ing types, the 
programmer will just have to manage it on is own or a mixed 
combination.

Another great benefit is performance. One may think with all the 
all the extra allocation and deallocation code that the end's of 
functions would be slowed down(do to objects being lazily 
released). This is not necessarily the case because it is 
LARC'ing, the releases can be done later... say outside a 
performant piece of the code. It can even be faster than directly 
releasing it because that could be slow in and of itself, which, 
again, with LARC'ing, we can chose a better time to release the 
object or do it in the background.

Unfortunately, if I'm not mistaken, with FLARC'ing, all classes 
would essentially double in size. This is because if every 
reference the type has is a RCT then it would essentially need to 
store a pointer to the object as to provide a link to it. With 
limited LARC'ing I think one could probably achieve a good 
balance. (for large classes and/or a large number of objects, one 
could just avoid LARC'ing to prevent the overhead and doing it 
manually or with GC)

Some codetalk(tm) of how the interaction goes:

      Object s is being initialized. It is a RFC type. It has, as 
a member, a object x that is also a RFC type.  In the course of s 
being initialized, it initializes x. We'll assume it is not null.
	 To do this s creates the object for x then informs x that it 
has a reference to it(we could have multiple types of references 
for further optimization if we want). e.g.,

	     x = new X;
	     x.CreateReference(s);
		 s.CreateReferenec(x); // We use directionless binding 
references

     Now, suppose at some point s is being released. At the end of 
scope the compiler inserts

	     s.Release();

     S now tries to release itself. It checks it's ref count and 
the list of all references it has.

	     e.g.,
		 refCount = 0;
		 refList = {}

		 then in s.Release()

			if (s.refCount == 0 && s.refList.IsEmpty)
			{
				GC.RemoveRCT(s); // GC.RemoveRCT is not necessary needed but 
using it hear for expressive purposes.
				s.Free();
			}
			else
				GC.AddRCT(s);   // Here, since the compiler potentially isn't 
going insert another s.Release() outside the function we have to 
let the GC handle it

		 in this case s free's itself immediately.

		 If either refCount != 0 or refList != empty, the GC is given 
control(it is made aware that S needs to be periodically checked 
to see if it can be released.(the GC simply does the
		 if (s.refCount == 0 && s.refList.IsEmpty) s.Free(); in a 
parallel thread. If s.refCount == 0 and s.refList is empty at 
some point then the GC ends up free'ing s.

		 Also, the compiler may be able to insert another s.Release() 
in the outer scope of the function all if it releases that it s 
was probably not released before. This doesn't hurt anything 
except an extra check that might be wasted(profiling could remove 
this check or even insert it).

		 The only way s can't be released is if it's refCount and 
refList are not empty. This is "danling" references and can only 
happen when mixing non-RCT's with RCT's.

		 So we know how s should release it self but how does t know 
that s no longer needs it? (so t can release itself at some point)

		 well, this has to do when x is assigned to. e.g.,

		 s.x = new X;
		 // Hidden, we have s update it's ref list and inform both the 
hold x and the new x what is going on.

		 here the assignment will need to remove the old x from refList 
and add the new x. This keeps the the list up to date. (there is 
a big corner case here that I believe would never occur but will 
have to think about it more)

Any ways, I'll leave it at that for now!