LRU cache for ~=

dsimcha dsimcha at yahoo.com
Mon Oct 19 15:39:23 PDT 2009


== Quote from Andrei Alexandrescu (SeeWebsiteForEmail at erdani.org)'s article
> dsimcha wrote:
> > == Quote from Andrei Alexandrescu (SeeWebsiteForEmail at erdani.org)'s article
> >> Making ~= work well for slices does not preclude creating a distinct
> >> library type.
> >> Andrei
> >
> > Yes, I see LRU/MRU/whatever as a kludge to allow convenience without being unsafe
> > egregiously inefficient.  The fact that it is a kludge is not a criticism because
> > there is clearly no easy answer, and thus a kludge is genuinely necessary.
> > However, I think there needs to be a separate array builder type for "heavy duty"
> > appending operations.  In TDPL I would just say that concatenating slices can be
> > inefficient and that people should use array builder for heavy duty appending,
> > length changing, etc., w/o getting into the details.
> But how about the opposite view. What if the *previous* implementation
> of GC and ~= were a kludge that led to egregious inefficiency and
> revolting unsafety, kludge that that now is getting fixed.
> What I'm saying is that with the cache in place we'll have slices that
> are safe and efficient. Right now they are not safe and not efficient. I
> can hardly find reasons to characterize the new state of affairs as kludgey.
> One surprising (but safe) behavior that remains with slices is this:
> void fun(int[] a) {
>     a[0] = 0;
>     a ~= 42;
>     a[0] = 42;
> }
> The caller may or may not see 42 in the first slot after the call.
> Andrei

Started playing w/ the implementation a little and I see a problem.  What about
the garbage collector?  There are two possibilities:

1.  The pointer gets stored as a pointer.  The array never gets freed until it
gets evicted from the cache.  If it's a huge array, this is very bad.

2.  The pointer gets cast to a size_t so it doesn't look like a reference.  Then,
we can have something like:

string foo;
foreach(i; 0..5) {
    foo ~= 'a';
}
foo = null;
GC.collect;

// Now we have (foo.ptr, 5) in our LRU cache, but foo has been GC'd.
string bar = "123456789";  // bar gets the memory reclaimed from foo.
string baz = bar[0..5];  // Same length, ptr as foo.
baz ~= '1';              // Finds stale information from foo in cache.
writeln(bar);  // Prints [1 2 3 4 5 1 7 8 9].

The only possible solutions I see would be to have the GC know everything about
the LRU cache and evict stale entries (probably slows down GC a lot, a huge PITA
to implement, couples things that shouldn't be tightly coupled), or clear the
cache every time GC is run (probably would make appending so slow as to defeat the
purpose of having the cache).



More information about the Digitalmars-d mailing list