std.allocator: FreeList uses simple statistics to control number of items
Morbid.Obesity via Digitalmars-d
digitalmars-d at puremagic.com
Wed May 20 23:05:58 PDT 2015
On Wednesday, 20 May 2015 at 17:28:50 UTC, Andrei Alexandrescu
wrote:
> https://github.com/andralex/phobos/commit/90120fc290bc7840ffbee22766798518b3418e15
>
> There is a bothersome issue with freelists fronting
> general-purpose allocators
> (https://en.wikipedia.org/wiki/Free_list): they can grow
> indefinitely. Because they keep memory allocated in their
> parent, they cause fragmentation to it.
>
> The worst pattern for a freelist is: an application allocates a
> bunch of blocks of the specific size the freelist is tuned for,
> then frees most of them and proceeds with allocating other
> sizes. In that case the large freelist is essentially leaked.
>
> My past FreeList implementation had a maximum list length as a
> parameter. That was quite a primitive way to go about it. But
> what I want is a way to model the application's behavior and
> automatically adapt to the allocation patterns. In particular,
> if demand for blocks of the freelist size dwindles, it should
> be possible for the freelist length to progressively go down to
> zero.
>
> So there are three possible allocation events:
>
> 1. The request is for a fit size and the free list is empty.
> That's a miss. It will be served from the parent allocator, and
> upon freeing the block will be added to the free list.
>
> 2. The request is for a fit size and the free list is not
> empty. That's a hit - nice. Serve it from the free list.
>
> 3. The request is for an unfit size. That's always served from
> the parent allocator, and returned to it upon deallocation.
>
> What we want is to minimize the likelihood of a miss. Say over
> the past N allocation requests we have N1, N2, and N3 counts
> for the three events. Then we can estimate the probability of a
> miss from these past events:
>
> Pmiss = N1 / N
>
> Trouble is, all counts keep increasing and there is little
> adaptation to changing patterns - the stats are averaged over
> the entire lifetime. It's better to keep stats over a
> fixed-size rolling window, say the past Nw events. In that
> case, if a new miss event occurs, we update the estimated miss
> probability as such:
>
> Pmiss = (Pmiss * Nw + 1) / (Nw + 1)
>
> That is, in the recent past we've seen Pmiss * Nw misses, now
> we see one more (+ 1), and we divide everything by the
> increased number of events. In case of a non-miss (hit or
> unfit), the update is:
>
> Pmiss = Pmiss * Nw / (Nw + 1)
>
> Note how the first equation converges to Pmiss = 1 and the
> second to Pmiss = 0. Nice.
>
> In a batch of events Nbatch of which there were Nmiss misses,
> the update is:
>
> Pmiss = (Pmiss * Nw + Nmiss) / (Nw + Nbatch)
>
> So after each allocation event, or after a few of them, we have
> a nice estimate of the probability we're missing the
> opportunity of allocating off the freelist. What I'm less sure
> about is how to efficiently wire that information into a
> feedback loop. Specifically:
>
> 1. How to update Pmiss efficiently? Most allocations should be
> fast, so it shouldn't be update with every call to allocate().
> What I have now is I update every K calls.
>
Could you simply not use this in a thread that simply informs
quickly that the free list needs to change it's optimization
strategy?
In the free list allocator, you simply check a flag
allocator()
{
if (needs_size_u)
{
do whatever
}
}
The threaded optimization routine can run as frequency as one
ones and even the frequency could be based on the recent
allocation's per second. (since higher allocations per second
fragment the list faster, optimization would be needed more than
when allocations are low(why keep trying to optimize the list
when no allocations are made?).
The thread could do such things as defragment the free list in
the GC stop the world cycle if it is ever called. If one uses the
GC, the free list could piggyback on it's fragmentation routine
to clean itself up).
> 2. How and when should the estimated Pmiss be used to reduce
> the size of the freelist? What I do now is when Pmiss is below
> a fixed threshold, I just periodically yank one element off of
> it.
>
A. When pmiss is large there are more misses and you can yank in
"proportion".
e.g., addcount = floor(C*(Pmiss^n-1/2))
here we simply fit scale Pmiss(or Phit) into an inter that
represents the count to add or yank(if addcount < 0 we yank,
which is why we subtract 1/2). C and n are constants that give us
our range and "intensity". They can also be optimized, such as
over the longer term use of the program(e.g. over hours of using
the program C and n slowly change to try and minimize Pmiss. this
is a way to actually have the program optimize itself to the
specific way the allocations work in context(both programming and
with the user specific behavior).
e.g., Pmiss = 0, addcount = floor(-C/2). so C represents the
maximum count we want to yank.
Pmiss = 1, addcoun t= floor(C/2) and hence C represents the
maximum to add.
(it happens to be symmetric in this case but does not need to be,
e.g., one could tailor the algorithms so they adapt and change
two different C's instead of one, which will optimize the
algorithm for high vs low memory usage)
B. Simply keep a running average of the previous n then use that
with your threshold. I tend to use these circular history buffers
for statistical analysis as they help smooth out the
data(essentially a low pass filter). This keeps weird issues from
happening(Think how some differential equation scan become
numerically stable and it's possible that your algorithm may
suffer the same issues in some cases(weird allocation
strategies). Averaging it will make it much more stable. Since
you already do this one it is not a huge(e.g., if you only
depending on the previous value then your optimization algorithm
will probably be unstable in some case).
C. You can essentially design whatever statistical measures of
things such as using the standard deviation, as mentioned, the
median, the mean, multi-parameter statistical estimations, etc
and simply form a weighted sum of the rules to and then optimize
the weights in a thread. The "sum" is a measure of many aspects
of how Pmiss behaves.
One can then use the correlation between Pmiss and sum.
(essentially compare there instantaneous directional derivatives)
e.g., if the sum and Pmiss are correlated then one tries to
adjust the weights and see's if they stay correlated or not. If
so, then the weights varied in the correct direction.
Of course, one should take averages the sums and Pmiss's to avoid
instability. (essentially high pass filtering the functions to
avoid any numerical aliasing)
The great thing about any of these methods(and they can be
combined in way you want) is that they can be used to
continuously optimize the the allocation strategies specific to
the exact context they are used in. (e.g., adapt to different
computers, memory layouts and amounts, program usage, etc...)
It happens naturally in these algorithms I'm mentioned(It is a
form of parametric modeling and neural networks)
Another thing is that one can use profiling to help get a
baseline optimization that will probably be significantly better
than the non-adapting version.
The main problem is that the larger number of weights produces
dimensional explosion, which means you'll have to use neural
networks to solve this issue.... which then gets into into nested
neural networks. Since we are using a neural network already to
optimize the program and then have to use another one to optimize
the neural network of the program's coefficients efficiently, it
gets complicated real quick ;)
More information about the Digitalmars-d
mailing list