topN using a heap

deadalnix via Digitalmars-d digitalmars-d at
Sun Jan 17 17:07:23 PST 2016

On Saturday, 16 January 2016 at 15:25:50 UTC, Andrei Alexandrescu 
> So, say you're looking for the smallest 10 elements out of 
> 100_000. The quickselect algorithm (which topN currently uses) 
> will successively partition the set in (assuming the pivot 
> choice works well) 50_000, 25_000, etc chunks all the way down 
> to finding the smallest 10.
> That's quite a bit of work, so 3934 uses an alternate strategy 
> for finding the smallest 10:
> 1. Organize the first 11 elements into a max heap
> 2. Scan all other elements progressively. Whenever an element 
> is found that is smaller than the largest in the heap, swap it 
> with the largest in the heap then restore the heap property.
> 3. At the end, swap the largest in the heap with the 10th and 
> you're done!
> This is very effective, and is seldom referred in the selection 
> literature. In fact, a more inefficient approach (heapify the 
> entire range) is discussed more often.
> Destroy!
> Andrei

A common way to do it is to go quicksort, but only recurse on one 
side of the set. That should give log(n)^2 complexity on average.

More information about the Digitalmars-d mailing list