Easy & huge GC optimizations

Etienne via Digitalmars-d digitalmars-d at puremagic.com
Fri May 23 09:33:55 PDT 2014


On 2014-05-23 2:17 AM, Rainer Schuetze wrote:
>
>
> On 22.05.2014 21:04, Etienne wrote:
>> On 2014-05-22 2:12 PM, Rainer Schuetze wrote:
>>>
>>> "NO_INTERIOR" is currently only used for the hash array used by
>>> associative arrays. It is a bit dangerous to use as any pointer,slice or
>>> register still operating on the array is ignored, so collecting it might
>>> corrupt your memory.
>>
>> That's quite a relief, I was afraid of having to do it ;)
>>
>> I'm currently exploring the possibility of sampling the pointers during
>> mark'ing to check if they're gone and using bayesian probabilities to
>> decide whether or not to skip the pool.
>>
>> I explained it all here:
>> https://github.com/D-Programming-Language/druntime/pull/797#issuecomment-43896016
>>
>>
>>
>> -- paste --
>> Basically, when marking, you take 1 in X of the references and send them
>> to a specific array that represents the pool they refer to. Then, next
>> time you're going to collect you test them individually and if they're
>> mostly there, you skip marking/free'ing for that particular pool during
>> collection. You can force collection on certain pools every 1 in X
>> collections to even out the average lifetime of the references.
>>
>> You're going to want to have a lower certainty of failure for big
>> allocations, but basically you're using probabilities to avoid pushing a
>> lot of useless load on the processor, especially when you're in a part
>> of an application that's just allocating a lot (sampling will determine
>> that the software is not in a state of data removal).
>>
>> http://en.wikipedia.org/wiki/Bayes_factor
>>
>> -- end paste --
>>
>> The bayes factor is merely there to choose the appropriate model that
>> fits with the program. Bayesian inference would take care of deciding if
>> a pool should end up being mark'ed. In other words, machine learning.
>>
>> Would you think it'd be a good optimization opportunity?
>
> Hmm, I guess I don't get the idea. You cannot skip a pool based on some
> statistics, you might have references in there to anything. As a result
> you cannot collect anything.
>

It only skips the inner search of the pool, like marking it NO_SCAN if a 
sample of the pointers that pointed to it are still alive.

I mean, why would you want to check the pointers and mark every page in 
a memory zone when you know they're probably all there anyways? The idea 
is that you could manage to avoid collection altogether during periods 
of high allocation.

There's no other way to guess it. And specifying "GC.disable()" before 
making allocations is a little too verbose to consider it an 
optimization of the GC.


More information about the Digitalmars-d mailing list