topN using a heap
deadalnix via Digitalmars-d
digitalmars-d at puremagic.com
Sun Jan 17 17:07:23 PST 2016
On Saturday, 16 January 2016 at 15:25:50 UTC, Andrei Alexandrescu
wrote:
> https://github.com/D-Programming-Language/phobos/pull/3934
>
> So, say you're looking for the smallest 10 elements out of
> 100_000. The quickselect algorithm (which topN currently uses)
> will successively partition the set in (assuming the pivot
> choice works well) 50_000, 25_000, etc chunks all the way down
> to finding the smallest 10.
>
> That's quite a bit of work, so 3934 uses an alternate strategy
> for finding the smallest 10:
>
> 1. Organize the first 11 elements into a max heap
>
> 2. Scan all other elements progressively. Whenever an element
> is found that is smaller than the largest in the heap, swap it
> with the largest in the heap then restore the heap property.
>
> 3. At the end, swap the largest in the heap with the 10th and
> you're done!
>
> This is very effective, and is seldom referred in the selection
> literature. In fact, a more inefficient approach (heapify the
> entire range) is discussed more often.
>
>
> Destroy!
>
> Andrei
A common way to do it is to go quicksort, but only recurse on one
side of the set. That should give log(n)^2 complexity on average.
More information about the Digitalmars-d
mailing list