topN using a heap

Sun Jan 17 07:42:29 PST 2016

On 01/17/2016 06:55 AM, Ivan Kazmenko wrote:
> So, what can be done is to introduce TopNStrategy.auto which is the
> default and uses the (current or better) heuristic to switch between
> strategies, but also leave a way to explicitly select one of the two
> strategies, just like one can do now with sort!(...,
> SwapStrategy.whatWeExplicitlyWant).

A nice idea, but it seems a bit overengineered. The differences among 
approaches are rather subtle and explaining the circumstances under 
which one does better than the other is about as difficult as making the 
choice in the implementation.

BTW there's yet another approach:

1. Create a max heap for r[0 .. nth]

2. Create a min heap for r[nth .. $]

3. As long as r[nth] < r[0], swap them and restore the heap property in 
the two heaps

At the end of the process we have the smallest element in r[nth .. $], 
which is exactly in the r[nth] position, greater than or equal to the 
largest element in r[0 .. nth], which is exactly what the doctor prescribed.

Complexity of this turns rather nice. Step 1 is O(k), step 2 is O(n - 
k). In the worst case (r is sorted descending), step 3 will stop after k 
iterations and does k heap insertions in the two heaps. So overall 
complexity is O(n + k log(k) + k log(n)). Since k <= n, keep the largest 
terms: O(n + k log(n)). So if k is proportional to n / log(n), we get 
O(n). And that's worst case!

BTW I figured how to do stable partition. That'll come in a distinct PR.

Andrei