topN using a heap
Andrei Alexandrescu via Digitalmars-d
digitalmars-d at puremagic.com
Sat Jan 16 07:25:50 PST 2016
https://github.com/D-Programming-Language/phobos/pull/3934
So, say you're looking for the smallest 10 elements out of 100_000. The
quickselect algorithm (which topN currently uses) will successively
partition the set in (assuming the pivot choice works well) 50_000,
25_000, etc chunks all the way down to finding the smallest 10.
That's quite a bit of work, so 3934 uses an alternate strategy for
finding the smallest 10:
1. Organize the first 11 elements into a max heap
2. Scan all other elements progressively. Whenever an element is found
that is smaller than the largest in the heap, swap it with the largest
in the heap then restore the heap property.
3. At the end, swap the largest in the heap with the 10th and you're done!
This is very effective, and is seldom referred in the selection
literature. In fact, a more inefficient approach (heapify the entire
range) is discussed more often.
Destroy!
Andrei
More information about the Digitalmars-d
mailing list