LRU cache for ~=

Tue Oct 20 07:05:42 PDT 2009

On Mon, 19 Oct 2009 22:37:26 -0400, dsimcha <dsimcha at yahoo.com> wrote:

> == Quote from Andrei Alexandrescu (SeeWebsiteForEmail at erdani.org)'s  
> article
>> dsimcha wrote:
>> > == Quote from Andrei Alexandrescu (SeeWebsiteForEmail at erdani.org)'s  
>> article
>> >> dsimcha wrote:
>> >>> == Quote from Andrei Alexandrescu (SeeWebsiteForEmail at erdani.org)'s  
>> article
>> >>>> dsimcha wrote:
>> >>>>> Started playing w/ the implementation a little and I see a  
>> problem.  What
> about
>> >>>>> the garbage collector?  There are two possibilities:
>> >>>> [snip]
>> >>>>> The only possible solutions I see would be to have the GC know  
>> everything
> about
>> >>>>> the LRU cache and evict stale entries (probably slows down GC a  
>> lot, a
> huge PITA
>> >>>>> to implement, couples things that shouldn't be tightly coupled),  
>> or clear the
>> >>>>> cache every time GC is run (probably would make appending so slow  
>> as to
>> > defeat the
>> >>>>> purpose of having the cache).
>> >>>> I think GC.collect may simply evict the entire cache. The  
>> collection
>> >>>> cycle costs so much, the marginal cost of losing cached  
>> information is
>> >>>> lost in the noise.
>> >>>> Andrei
>> >>> But then you have to copy the whole array again, likely triggering  
>> another GC if
>> >>> the array is large.  Then things really get ugly as, for all  
>> practical purposes,
>> >>> you've completely done away with the cache.
>> >> This happens whether or not a cache is in use.
>> >> Andrei
>> >
>> > But the array isn't guaranteed to get reallocated immediately after  
>> *every* GC
>> > run.  If you're appending to a huge array, the GC will likely run  
>> several times
>> > while you're appending, leading to several unnecessary reallocations.
>> I don't think I understand this.
>> 1. Request for an append comes that runs out of memory
>> 2. GC runs and clears memory
>> 3. Array is reallocated and the capacity cached.
>> No?
>
> This is entirely correct.
>
>> > Each of
>> > those unnecessary reallocations will increase the memory footprint of  
>> your
>> > program, possibly triggering another GC run and wiping out your cache  
>> again in
>> > short order, until, for sufficiently large arrays,
>> >
>> > a ~= b;
>> >
>> > is almost equivalent to
>> >
>> > a = a ~ b;
>> I don't understand how the cache makes that all worse.
>> Andrei
>
> The cache doesn't make anything *worse* than with no cache.  The only  
> point I'm
> trying to make is that, for large arrays, if the GC clears the cache  
> every time it
> runs, things would start to get *almost as bad as* having no cache  
> because the
> copy operations become expensive and the GC may run frequently.

The cache can't be "cleared" every time, or else you might as well only  
keep one LRU entry:

int[] twos, threes;

for(int i = 1; i < 10000; i++)
{
   twos ~= i * 2;
   threes ~= i * 3;
}

At some point, twos or threes needs an allocation triggering a collection,  
and that clears the cache, making the other array need an allocation,  
clearing the cache, etc.

I'd think you only want to clear the entries affected by the collection.

-Steve