LRU cache for ~=

Tue Oct 20 07:48:31 PDT 2009

On Tue, 20 Oct 2009 10:05:42 -0400, Steven Schveighoffer  
<schveiguy at yahoo.com> wrote:

> On Mon, 19 Oct 2009 22:37:26 -0400, dsimcha <dsimcha at yahoo.com> wrote:
>
>> == Quote from Andrei Alexandrescu (SeeWebsiteForEmail at erdani.org)'s  
>> article
>>> dsimcha wrote:
>>> > == Quote from Andrei Alexandrescu (SeeWebsiteForEmail at erdani.org)'s  
>>> article
>>> >> dsimcha wrote:
>>> >>> == Quote from Andrei Alexandrescu  
>>> (SeeWebsiteForEmail at erdani.org)'s article
>>> >>>> dsimcha wrote:
>>> >>>>> Started playing w/ the implementation a little and I see a  
>>> problem.  What
>> about
>>> >>>>> the garbage collector?  There are two possibilities:
>>> >>>> [snip]
>>> >>>>> The only possible solutions I see would be to have the GC know  
>>> everything
>> about
>>> >>>>> the LRU cache and evict stale entries (probably slows down GC a  
>>> lot, a
>> huge PITA
>>> >>>>> to implement, couples things that shouldn't be tightly coupled),  
>>> or clear the
>>> >>>>> cache every time GC is run (probably would make appending so  
>>> slow as to
>>> > defeat the
>>> >>>>> purpose of having the cache).
>>> >>>> I think GC.collect may simply evict the entire cache. The  
>>> collection
>>> >>>> cycle costs so much, the marginal cost of losing cached  
>>> information is
>>> >>>> lost in the noise.
>>> >>>> Andrei
>>> >>> But then you have to copy the whole array again, likely triggering  
>>> another GC if
>>> >>> the array is large.  Then things really get ugly as, for all  
>>> practical purposes,
>>> >>> you've completely done away with the cache.
>>> >> This happens whether or not a cache is in use.
>>> >> Andrei
>>> >
>>> > But the array isn't guaranteed to get reallocated immediately after  
>>> *every* GC
>>> > run.  If you're appending to a huge array, the GC will likely run  
>>> several times
>>> > while you're appending, leading to several unnecessary reallocations.
>>> I don't think I understand this.
>>> 1. Request for an append comes that runs out of memory
>>> 2. GC runs and clears memory
>>> 3. Array is reallocated and the capacity cached.
>>> No?
>>
>> This is entirely correct.
>>
>>> > Each of
>>> > those unnecessary reallocations will increase the memory footprint  
>>> of your
>>> > program, possibly triggering another GC run and wiping out your  
>>> cache again in
>>> > short order, until, for sufficiently large arrays,
>>> >
>>> > a ~= b;
>>> >
>>> > is almost equivalent to
>>> >
>>> > a = a ~ b;
>>> I don't understand how the cache makes that all worse.
>>> Andrei
>>
>> The cache doesn't make anything *worse* than with no cache.  The only  
>> point I'm
>> trying to make is that, for large arrays, if the GC clears the cache  
>> every time it
>> runs, things would start to get *almost as bad as* having no cache  
>> because the
>> copy operations become expensive and the GC may run frequently.
>
> The cache can't be "cleared" every time, or else you might as well only  
> keep one LRU entry:
>
> int[] twos, threes;
>
> for(int i = 1; i < 10000; i++)
> {
>    twos ~= i * 2;
>    threes ~= i * 3;
> }
>
> At some point, twos or threes needs an allocation triggering a  
> collection, and that clears the cache, making the other array need an  
> allocation, clearing the cache, etc.
>
> I'd think you only want to clear the entries affected by the collection.
>
> -Steve

If it was free and simple to only clear the affected entries, sure. But  
doing so requires (very heavy?) modification of the GC in order to track  
and check changes. It also reduces collection performance. I think, that  
if GC allocations added entries to the LRU, and therefore the information  
in the LRU is never stale, you could avoid clearing the LRU. But this  
requires the LRU to be part of the GC.