An interesting data structure with search time O(sqrt n)
Andrei Alexandrescu via Digitalmars-d
digitalmars-d at puremagic.com
Tue Dec 1 14:48:56 PST 2015
On 12/01/2015 12:13 AM, Andrei Alexandrescu wrote:
> On 11/30/15 9:47 PM, Timon Gehr wrote:
>> On 12/01/2015 03:33 AM, Timon Gehr wrote:
>>> On 11/30/2015 09:57 PM, Andrei Alexandrescu wrote:
>>>> So now consider my square heaps. We have O(n) build time (just a bunch
>>>> of heapifications) and O(sqrt n) search.
>>>
>>> How do you build in O(n)? (The initial array is assumed to be completely
>>> unordered, afaict.)
>>
>> (I meant to say: There aren't any assumptions on the initial ordering of
>> the array elements.)
>
> That's quite challenging. (My O(n) estimate was off the cuff and
> possibly wrong.) Creating the structure entails simultaneously solving
> the selection problem (find the k smallest elements) for several values
> of k. I'll post here if I come up with something. -- Andrei
OK, I think I have an answer to this in the form of an efficient algorithm.
First off: sizes 1+3+5+7+... seem a great choice, I'll use that for the
initial implementation (thanks Titus!).
Second: the whole max heap is a red herring - min heap is just as good,
and in fact better. When doing the search just overshoot by one then go
back one heap to the left and do the final linear search in there.
So the structure we're looking at is an array of adjacent min-heaps of
sizes 1, 3, 5, etc. The heaps are ordered (the maximum of heap k is less
than or equal to the minimum of heap k+1). Question is how do we build
such an array of heaps in place starting from an unstructured array of
size n.
One simple approach is to just sort the array in O(n log n). This
satisfies all properties - all adjacent subsequences are obviously
ordered, and any subsequence has the min heap property. As an
engineering approach we may as well stop here - sorting is a widely
studied and well implemented algorithm. However, we hope to get away
with less work because we don't quite need full sorting.
Here's the intuition: the collection of heaps can be seen as one large
heap that has a DAG structure (as opposed to a tree). In the DAG, the
root of heap k+1 is the child of all leaves of heap k (see
http://imgur.com/I366GYS which shows the DAG for the 1, 3, 7, and 7 heaps).
Clearly getting this structure to respect the heap property is all
that's needed for everything to work - so we simply apply the classic
heapify algorithm to it. It seems it can be applied almost unchanged -
starting from the end, sift each element down the DAG.
This looks efficient and minimal; I doubt there's any redundant work.
However, getting bounds for complexity of this will be tricky. Classic
heapify is tricky, too - it seems to have complexity O(n log n) but in
fact has complexity O(n) - see nice discussion at
http://stackoverflow.com/questions/9755721/how-can-building-a-heap-be-on-time-complexity.
When applying heapify to the DAG, there's more restrictions and the
paths are longer, so a sliver more than O(n) is expected.
Anyway, this looks ready for a preliminary implementation and some more
serious calculations.
One more interesting thing: the heap heads are sorted, so when
searching, the heap corresponding to the searched item can be found
using binary search. That makes that part of the search essentially
negligible - the lion's share will be the linear search on the last
mile. In turn, that suggests that more heaps that are smaller would be a
better choice. (At an extreme, if we have an array of heaps each
proportional to log(n), then we get search time O(log n) even though the
array is not entirely sorted.)
Andrei
More information about the Digitalmars-d
mailing list