topN using a heap

Ivan Kazmenko via Digitalmars-d digitalmars-d at puremagic.com
Sat Jan 16 13:28:22 PST 2016


On Saturday, 16 January 2016 at 15:25:50 UTC, Andrei Alexandrescu 
wrote:
> That's quite a bit of work, so 3934 uses an alternate strategy 
> for finding the smallest 10:
>
> 1. Organize the first 11 elements into a max heap
>
> 2. Scan all other elements progressively. Whenever an element 
> is found that is smaller than the largest in the heap, swap it 
> with the largest in the heap then restore the heap property.
>
> 3. At the end, swap the largest in the heap with the 10th and 
> you're done!
>
> This is very effective, and is seldom referred in the selection 
> literature. In fact, a more inefficient approach (heapify the 
> entire range) is discussed more often.

Yeah, this sounds nice.  Suppose we have an array of length n and 
want to find the smallest k elements.  The first step (heapify) 
is O(k) operations.

1. Let us analyze the rest for "small enough" k.

1a. If the array is "random", the expected number of insertions 
is the sum of probabilities that each new element is inserted.  
This number is (k/(k+1)) + (k/(k+2)) + ... + k/n, which is k 
times a part of harmonic series, that is, on the order of k log 
n.  In each case, O(log k) operations are required to process the 
new element.  In total, the "average" case takes (n-k) + (k log n 
log k), a total of O(n) for small enough values of k.

1b. In the worst case (each element is less than the current 
top), this still requires (n-k) log k operations, so we can say 
the total is O(n log k).

2. For k of the same order as n, the algorithm is O(n log n), 
just as a regular HeapSort.

To summarize:
For k<<n we have O(n) average case and O(n log k) worst case.
For k~n we have O(n log n) as HeapSort.



More information about the Digitalmars-d mailing list