topN using a heap
Ivan Kazmenko via Digitalmars-d
digitalmars-d at puremagic.com
Mon Jan 18 18:44:29 PST 2016
On Tuesday, 19 January 2016 at 00:11:40 UTC, Andrei Alexandrescu
wrote:
> 4. sort was and is attackable before all of these changes
>
> (b) Improve sort() first, then apply a similar strategy to
> improving topN. Do not use RNGs at all.
Since point 4 is in fact already fixed a good while ago, my
suggestion would be (b): to do the same for topN.
Introselect (https://en.wikipedia.org/wiki/Introselect), already
mentioned in this thread, uses median-of-medians to achieve worst
case O(n) performance if we recurse too deep. There is already
an issue suggesting to implement it:
https://issues.dlang.org/show_bug.cgi?id=12141 (std.algorithm:
implement deterministic topN).
In fact, the O(n log n) heap approach as it is implemented now
could be faster than O(n) median-of-medians approach on
reasonable inputs. So, someone will have to implement
median-of-medians and, well, measure.
At the very least, googling for "median of medians in practice"
and such yields the tag wiki from StackOverflow.com: "The
constant factor in the O(n) is large, and the algorithm is not
commonly used in practice.".
Ivan Kazmenko.
More information about the Digitalmars-d
mailing list