topN using a heap

Ivan Kazmenko via Digitalmars-d digitalmars-d at puremagic.com
Mon Jan 18 18:44:29 PST 2016


On Tuesday, 19 January 2016 at 00:11:40 UTC, Andrei Alexandrescu 
wrote:
> 4. sort was and is attackable before all of these changes
>
> (b) Improve sort() first, then apply a similar strategy to 
> improving topN. Do not use RNGs at all.

Since point 4 is in fact already fixed a good while ago, my 
suggestion would be (b): to do the same for topN.

Introselect (https://en.wikipedia.org/wiki/Introselect), already 
mentioned in this thread, uses median-of-medians to achieve worst 
case O(n) performance if we recurse too deep.  There is already 
an issue suggesting to implement it: 
https://issues.dlang.org/show_bug.cgi?id=12141 (std.algorithm: 
implement deterministic topN).

In fact, the O(n log n) heap approach as it is implemented now 
could be faster than O(n) median-of-medians approach on 
reasonable inputs.  So, someone will have to implement 
median-of-medians and, well, measure.

At the very least, googling for "median of medians in practice" 
and such yields the tag wiki from StackOverflow.com: "The 
constant factor in the O(n) is large, and the algorithm is not 
commonly used in practice.".

Ivan Kazmenko.



More information about the Digitalmars-d mailing list