Implementation

Jason Causey via Digitalmars-d digitalmars-d at puremagic.com
Thu Dec 10 15:06:42 PST 2015


Hello all!  I happened by this thread (from Hacker News) and the 
idea of this data structure got stuck in my head.  I did some 
scribbling on paper and decided that it could be interesting to 
code...

On Thursday, 3 December 2015 at 22:44:26 UTC, Andrei Alexandrescu 
wrote:
> At this point I need to either get to the bottom of the math or 
> put together an experimental rig that counts comparisons and 
> swaps.

I've built a test implementation (in C++ ... I hope that isn't 
too distasteful on a D forum, but it's what I'm most comfortable 
with) here:  https://github.com/jcausey-astate/HeapArray

I chose to use a Min-Max heap[1] for the heaps themselves (this 
buys me O(1) min and max access).  This solves the problem of 
making insert and delete follow the same pattern (to insert, 
ripple the max value in a full partition to the next one; in a 
delete, fill in the "hole" with min value from previous 
partition).

I wasn't able to come up with anything better than pre-sorting 
the whole thing, then running the Floyd "make heap" on each 
partition.  So, the whole thing costs O(n*lg(n) + n)) to make a 
new structure on top of an existing array.  This is still faster 
than doing a top-down (add every element) make-heap though.  I 
agree with Timon's analysis[2] on that.

I also agree with Andrei that I have a "gut feeling" that there 
could be a faster setup algorithm, since we really just need to 
shuffle values into the right partitions, not actually fully sort 
them... But that would require knowing exactly what the partition 
"pivot" values were in advance, which would require searching, 
and 'round we go.

My code is still rough, only works for ints, and needs some 
documentation and cleanup, but it does show that our hunches were 
more or less correct:  This thing is much faster than linear for 
searching, but not as fast as logarithmic.  It also performs OK 
when adding new values and performing lots of searches (when 
compared with a plain old array or vector where you have to 
linear search every time).


My Github README has some charts, but I'll link a couple here:
Search times (vector VS this structure VS multiset)
https://plot.ly/~jcausey-astate/7/search-timing-vector-vs-heaparray-vs-multiset/

Time to add N values starting with an empty structure (vector VS 
this structure VS multiset)
https://plot.ly/~jcausey-astate/18/fill-container-dynamically-heaparray-vs-vector-and-multiset/

Time to initially build the structure given an already-populated 
array (vector VS this structure VS multiset):
https://plot.ly/~jcausey-astate/35/build-data-structure-on-existing-array-all-at-once/


[1]: 
http://www.akira.ruc.dk/~keld/teaching/algoritmedesign_f03/Artikler/02../Atkinson86.pdf

[2]: http://forum.dlang.org/post/n3qqkm$2c6t$1@digitalmars.com


More information about the Digitalmars-d mailing list