Easy & huge GC optimizations

Chris via Digitalmars-d digitalmars-d at puremagic.com
Fri May 23 10:29:27 PDT 2014


On Friday, 23 May 2014 at 15:41:39 UTC, John Colvin wrote:
> On Friday, 23 May 2014 at 13:43:53 UTC, Chris wrote:
>> On Friday, 23 May 2014 at 06:17:43 UTC, Rainer Schuetze wrote:
>>>
>>>
>>> On 22.05.2014 21:04, Etienne wrote:
>>>> On 2014-05-22 2:12 PM, Rainer Schuetze wrote:
>>>>>
>>>>> "NO_INTERIOR" is currently only used for the hash array 
>>>>> used by
>>>>> associative arrays. It is a bit dangerous to use as any 
>>>>> pointer,slice or
>>>>> register still operating on the array is ignored, so 
>>>>> collecting it might
>>>>> corrupt your memory.
>>>>
>>>> That's quite a relief, I was afraid of having to do it ;)
>>>>
>>>> I'm currently exploring the possibility of sampling the 
>>>> pointers during
>>>> mark'ing to check if they're gone and using bayesian 
>>>> probabilities to
>>>> decide whether or not to skip the pool.
>>>>
>>>> I explained it all here:
>>>> https://github.com/D-Programming-Language/druntime/pull/797#issuecomment-43896016
>>>>
>>>>
>>>> -- paste --
>>>> Basically, when marking, you take 1 in X of the references 
>>>> and send them
>>>> to a specific array that represents the pool they refer to. 
>>>> Then, next
>>>> time you're going to collect you test them individually and 
>>>> if they're
>>>> mostly there, you skip marking/free'ing for that particular 
>>>> pool during
>>>> collection. You can force collection on certain pools every 
>>>> 1 in X
>>>> collections to even out the average lifetime of the 
>>>> references.
>>>>
>>>> You're going to want to have a lower certainty of failure 
>>>> for big
>>>> allocations, but basically you're using probabilities to 
>>>> avoid pushing a
>>>> lot of useless load on the processor, especially when you're 
>>>> in a part
>>>> of an application that's just allocating a lot (sampling 
>>>> will determine
>>>> that the software is not in a state of data removal).
>>>>
>>>> http://en.wikipedia.org/wiki/Bayes_factor
>>>>
>>>> -- end paste --
>>>>
>>>> The bayes factor is merely there to choose the appropriate 
>>>> model that
>>>> fits with the program. Bayesian inference would take care of 
>>>> deciding if
>>>> a pool should end up being mark'ed. In other words, machine 
>>>> learning.
>>>>
>>>> Would you think it'd be a good optimization opportunity?
>>>
>>> Hmm, I guess I don't get the idea. You cannot skip a pool 
>>> based on some statistics, you might have references in there 
>>> to anything. As a result you cannot collect anything.
>>
>> I'm not a fan of machine learning, especially not in cases you 
>> _can_ control, like memory allocation / deallocation. Guessing 
>> is not a good strategy, if you can have control over 
>> something. Machine learning is only good for vast and 
>> unpredictable data (voice recognition for example). Then it 
>> makes sense to apply probability. But if you can actually 
>> control what you are doing, why would you want to rely on a 
>> stupid and blind machine that decides things for you based on 
>> a probability of n%? Things can go wrong and you don't even 
>> know why. Mind you, we should rule the machines, not the other 
>> way around.
>
> Bear in mind here that most code goes though a whole bunch of 
> machine learning algorithms in the CPU itself. Like it or not, 
> it has proved extremely successful.

What I'm saying is that in cases where you do have control you 
should not transfer it to the machine. Either you free memory 
yourself with free() or the GC mechanism is exact and does not 
"assume" things. This could cause inexplicable random bugs. I 
remember that about the GC introduced in Objective-C the manual 
said something like: Some objects may never be collected. I'm not 
an expert on GC, far from it, but I didn't like the sound of it.

I know that CPU's do a good bit of guessing. But that's not the 
same thing. If they err, they make up for it ("Ooops, it's not in 
the cache! Will get it from HD, just a nanosec!"). If the GC 
errs, how do you make up for it? Please educate me.


More information about the Digitalmars-d mailing list