topN using a heap
Andrei Alexandrescu via Digitalmars-d
digitalmars-d at puremagic.com
Sun Jan 17 07:42:29 PST 2016
On 01/17/2016 06:55 AM, Ivan Kazmenko wrote:
> So, what can be done is to introduce TopNStrategy.auto which is the
> default and uses the (current or better) heuristic to switch between
> strategies, but also leave a way to explicitly select one of the two
> strategies, just like one can do now with sort!(...,
A nice idea, but it seems a bit overengineered. The differences among
approaches are rather subtle and explaining the circumstances under
which one does better than the other is about as difficult as making the
choice in the implementation.
BTW there's yet another approach:
1. Create a max heap for r[0 .. nth]
2. Create a min heap for r[nth .. $]
3. As long as r[nth] < r, swap them and restore the heap property in
the two heaps
At the end of the process we have the smallest element in r[nth .. $],
which is exactly in the r[nth] position, greater than or equal to the
largest element in r[0 .. nth], which is exactly what the doctor prescribed.
Complexity of this turns rather nice. Step 1 is O(k), step 2 is O(n -
k). In the worst case (r is sorted descending), step 3 will stop after k
iterations and does k heap insertions in the two heaps. So overall
complexity is O(n + k log(k) + k log(n)). Since k <= n, keep the largest
terms: O(n + k log(n)). So if k is proportional to n / log(n), we get
O(n). And that's worst case!
BTW I figured how to do stable partition. That'll come in a distinct PR.
More information about the Digitalmars-d