Easy & huge GC optimizations

John Colvin via Digitalmars-d digitalmars-d at puremagic.com
Fri May 23 08:41:38 PDT 2014


On Friday, 23 May 2014 at 13:43:53 UTC, Chris wrote:
> On Friday, 23 May 2014 at 06:17:43 UTC, Rainer Schuetze wrote:
>>
>>
>> On 22.05.2014 21:04, Etienne wrote:
>>> On 2014-05-22 2:12 PM, Rainer Schuetze wrote:
>>>>
>>>> "NO_INTERIOR" is currently only used for the hash array used 
>>>> by
>>>> associative arrays. It is a bit dangerous to use as any 
>>>> pointer,slice or
>>>> register still operating on the array is ignored, so 
>>>> collecting it might
>>>> corrupt your memory.
>>>
>>> That's quite a relief, I was afraid of having to do it ;)
>>>
>>> I'm currently exploring the possibility of sampling the 
>>> pointers during
>>> mark'ing to check if they're gone and using bayesian 
>>> probabilities to
>>> decide whether or not to skip the pool.
>>>
>>> I explained it all here:
>>> https://github.com/D-Programming-Language/druntime/pull/797#issuecomment-43896016
>>>
>>>
>>> -- paste --
>>> Basically, when marking, you take 1 in X of the references 
>>> and send them
>>> to a specific array that represents the pool they refer to. 
>>> Then, next
>>> time you're going to collect you test them individually and 
>>> if they're
>>> mostly there, you skip marking/free'ing for that particular 
>>> pool during
>>> collection. You can force collection on certain pools every 1 
>>> in X
>>> collections to even out the average lifetime of the 
>>> references.
>>>
>>> You're going to want to have a lower certainty of failure for 
>>> big
>>> allocations, but basically you're using probabilities to 
>>> avoid pushing a
>>> lot of useless load on the processor, especially when you're 
>>> in a part
>>> of an application that's just allocating a lot (sampling will 
>>> determine
>>> that the software is not in a state of data removal).
>>>
>>> http://en.wikipedia.org/wiki/Bayes_factor
>>>
>>> -- end paste --
>>>
>>> The bayes factor is merely there to choose the appropriate 
>>> model that
>>> fits with the program. Bayesian inference would take care of 
>>> deciding if
>>> a pool should end up being mark'ed. In other words, machine 
>>> learning.
>>>
>>> Would you think it'd be a good optimization opportunity?
>>
>> Hmm, I guess I don't get the idea. You cannot skip a pool 
>> based on some statistics, you might have references in there 
>> to anything. As a result you cannot collect anything.
>
> I'm not a fan of machine learning, especially not in cases you 
> _can_ control, like memory allocation / deallocation. Guessing 
> is not a good strategy, if you can have control over something. 
> Machine learning is only good for vast and unpredictable data 
> (voice recognition for example). Then it makes sense to apply 
> probability. But if you can actually control what you are 
> doing, why would you want to rely on a stupid and blind machine 
> that decides things for you based on a probability of n%? 
> Things can go wrong and you don't even know why. Mind you, we 
> should rule the machines, not the other way around.

Bear in mind here that most code goes though a whole bunch of 
machine learning algorithms in the CPU itself. Like it or not, it 
has proved extremely successful.


More information about the Digitalmars-d mailing list