topN using a heap

deadalnix via Digitalmars-d digitalmars-d at puremagic.com
Sun Jan 17 17:07:23 PST 2016


On Saturday, 16 January 2016 at 15:25:50 UTC, Andrei Alexandrescu 
wrote:
> https://github.com/D-Programming-Language/phobos/pull/3934
>
> So, say you're looking for the smallest 10 elements out of 
> 100_000. The quickselect algorithm (which topN currently uses) 
> will successively partition the set in (assuming the pivot 
> choice works well) 50_000, 25_000, etc chunks all the way down 
> to finding the smallest 10.
>
> That's quite a bit of work, so 3934 uses an alternate strategy 
> for finding the smallest 10:
>
> 1. Organize the first 11 elements into a max heap
>
> 2. Scan all other elements progressively. Whenever an element 
> is found that is smaller than the largest in the heap, swap it 
> with the largest in the heap then restore the heap property.
>
> 3. At the end, swap the largest in the heap with the 10th and 
> you're done!
>
> This is very effective, and is seldom referred in the selection 
> literature. In fact, a more inefficient approach (heapify the 
> entire range) is discussed more often.
>
>
> Destroy!
>
> Andrei

A common way to do it is to go quicksort, but only recurse on one 
side of the set. That should give log(n)^2 complexity on average.


More information about the Digitalmars-d mailing list