topN using a heap

Ivan Kazmenko via Digitalmars-d digitalmars-d at puremagic.com
Mon Jan 18 04:00:10 PST 2016


On Sunday, 17 January 2016 at 22:20:30 UTC, Andrei Alexandrescu 
wrote:
> On 01/17/2016 03:32 PM, Ivan Kazmenko wrote:
>> Here is a more verbose version.
>
> OK, very nice. Thanks! I've modified topN to work as follows. 
> In a loop:
>
> * If nth <= r.length / log2(r.length)^^2 (or is similarly close 
> to r.length), use topNHeap with one heap and stop
> * If nth <= r.length / log2(r.length) (or is similarly close to 
> r.length), use topNHeap with two heaps and stop
> * Take a quickselect step, adjust r and nth, and continue
>
> That seems to work nicely for all values involved.
>
> All - let me know how things can be further improved. Thx!
>
>
> Andrei

Here goes the test which shows quadratic behavior for the new 
version:
http://dpaste.dzfl.pl/e4b3bc26c3cf
(dpaste kills the slow code before it completes the task)

My local timings with "-O -release -noboundscheck -inline":
building Element array: 4989 msecs
checking Element array: 5018 msecs
checking int array: 626 msecs

With "-debug -unittest":
building Element array: 8384 msecs
checking Element array: 8380 msecs
checking int array: 2987 msecs

If we change the length MAX_N to something observable, say, 50, 
the array is:
[0, 536870911, 2, 536870911, 4, 536870911, 6, 36, 8, 33, 10, 35, 
12, 536870911, 14, 32, 16, 34, 18, 536870911, 536870911, 22, 23, 
21, 20, 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 31, 30, 536870911, 29, 
536870911, 28, 536870911, 27, 536870911, 26, 536870911, 25, 
536870911, 24, 536870911]

The old version (2.070.0-b2) could not be tricked with it, does 
it use random?

The inspiration is the paper "A Killer Adversary for Quicksort":
http://www.cs.dartmouth.edu/~doug/mdmspe.pdf
(I already mentioned it on the forums a while ago)

Ivan Kazmenko.



More information about the Digitalmars-d mailing list