And here's another interesting algorithm/structure: Randomized Slide to Front

Mon Nov 30 20:07:38 PST 2015

On Monday, 30 November 2015 at 21:33:31 UTC, Andrei Alexandrescu 
wrote:
[...]
> One well-known search strategy is "Bring to front" (described 
> by Knuth in TAoCP). A BtF-organized linear data structure is 
> searched with the classic linear algorithm. The difference is 
> what happens after the search: whenever the search is 
> successful, the found element is brought to the front of the 
> structure. If we're looking most often for a handful of 
> elements, in time these will be near the front of the searched 
> structure.
[...]
> Another idea is to just swap the found element with the one 
> just before it. The logic is, each successful find will shift 
> the element closer to the front, in a bubble sort manner. In 
> time, the frequently searched elements will slowly creep toward 
> the front. The resulting performance is not appealing - you 
> need O(n) searches to bring a given element to the front, for a 
> total of O(n * n) steps spent in the n searches. Meh.
>
> So let's improve on that: whenever an element is found in 
> position k, pick a random number i in the range 0, 1, 2, ..., k 
> inclusive. Then swap the array elements at indexes i and k. 
> This is the Randomized Slide to Front strategy.
>
[...]
> Insertion and removal are both a sweet O(1), owing to the light 
> structuring: to insert just append the element (and perhaps 
> swap it in a random position of the array to prime searching 
> for it). Removal by position simply swaps the last element into 
> the position to be removed and then reduces the size of the 
> array.
[...]
> Andrei

It seems to me you're trying to implement the array based 
equivalent of Splay Trees (Splay Array rhymes, btw). Would that 
be a close enough description?

I'm assuming you're trying to optimize for some distribution 
where a minority of the elements account for the majority of 
queries (say, Zipfian).

Here are some ideas that come to mind. I haven't thought through 
them too much so everyone's welcome to destroy me.

Rather than making index 0 always the front, use some rotating 
technique similar to what ring buffers do.

Say we initially have elements ABCDE (front at 0) and we search 
for C. We swap the left of front (cycling back to the end of the 
array, thus index 4) with the new front. We now have the 
following array at hand: ABEDC, front at 4 (logically CABED).

Obviously we shouldn't change front if the queried element is 
already it.

An immediate problem with this technique is that we'll frequently 
pollute the front of the array with infrequent items. Say these 
are the number of queries made so far for each element: A:7, B:5, 
C:2, all others 0. Also, assume that this is the state of the 
array at this point: DEABC, front at 2. Say we now query for B. 
This is the new state: DBAEC, front at 1 (logically BAECD). 
Having E in front of C is undesirable, so we need a way to avoid 
that.

 From now on I'll refer to indexes as the logical index. That is, 
let i be (front + index) % size. For the sake of brevity, let d 
be the distance between the element and the front = i - front. 
Let q be the number of successful queries performed so far.

What I have in mind boils down to decide between:
- move a newly queried element at logical position i to the left 
of front (technique above). Let's call it move-pre-front for the 
lack of a better name;
- bubble the element up to some position between [0, i), not 
necessarily max(0, i - 1).

Augmenting the array with the number of queries for each element 
would tremendously help the decision making, but I'm assuming 
that's undesirable for a few reasons like:
- the array can't be transparently used in algorithms unaware of 
the structure;
- alignment;
- data bloating.

My immediate thought is to use some heuristic. For instance, say 
we have some threshold k. If d <= k, we bubble up s <= d 
positions to the left, where s could be computed using some 
deterministic formula taking d, q and/or k into account, or just 
randomly (Andrei's RStF). If d > k, we move-pre-front the element.

The threshold k could be computed as a factor of q. Say, sqrt(q), 
log q or log^2 q (logarithm base 2).

Thoughts?

Marcelo