An interesting data structure with search time O(sqrt n)

Tue Dec 1 14:48:56 PST 2015

On 12/01/2015 12:13 AM, Andrei Alexandrescu wrote:
> On 11/30/15 9:47 PM, Timon Gehr wrote:
>> On 12/01/2015 03:33 AM, Timon Gehr wrote:
>>> On 11/30/2015 09:57 PM, Andrei Alexandrescu wrote:
>>>> So now consider my square heaps. We have O(n) build time (just a bunch
>>>> of heapifications) and O(sqrt n) search.
>>>
>>> How do you build in O(n)? (The initial array is assumed to be completely
>>> unordered, afaict.)
>>
>> (I meant to say: There aren't any assumptions on the initial ordering of
>> the array elements.)
>
> That's quite challenging. (My O(n) estimate was off the cuff and
> possibly wrong.) Creating the structure entails simultaneously solving
> the selection problem (find the k smallest elements) for several values
> of k. I'll post here if I come up with something. -- Andrei

OK, I think I have an answer to this in the form of an efficient algorithm.

First off: sizes 1+3+5+7+... seem a great choice, I'll use that for the 
initial implementation (thanks Titus!).

Second: the whole max heap is a red herring - min heap is just as good, 
and in fact better. When doing the search just overshoot by one then go 
back one heap to the left and do the final linear search in there.

So the structure we're looking at is an array of adjacent min-heaps of 
sizes 1, 3, 5, etc. The heaps are ordered (the maximum of heap k is less 
than or equal to the minimum of heap k+1). Question is how do we build 
such an array of heaps in place starting from an unstructured array of 
size n.

One simple approach is to just sort the array in O(n log n). This 
satisfies all properties - all adjacent subsequences are obviously 
ordered, and any subsequence has the min heap property. As an 
engineering approach we may as well stop here - sorting is a widely 
studied and well implemented algorithm. However, we hope to get away 
with less work because we don't quite need full sorting.

Here's the intuition: the collection of heaps can be seen as one large 
heap that has a DAG structure (as opposed to a tree). In the DAG, the 
root of heap k+1 is the child of all leaves of heap k (see 
http://imgur.com/I366GYS which shows the DAG for the 1, 3, 7, and 7 heaps).

Clearly getting this structure to respect the heap property is all 
that's needed for everything to work - so we simply apply the classic 
heapify algorithm to it. It seems it can be applied almost unchanged - 
starting from the end, sift each element down the DAG.

This looks efficient and minimal; I doubt there's any redundant work. 
However, getting bounds for complexity of this will be tricky. Classic 
heapify is tricky, too - it seems to have complexity O(n log n) but in 
fact has complexity O(n) - see nice discussion at 
http://stackoverflow.com/questions/9755721/how-can-building-a-heap-be-on-time-complexity. 
When applying heapify to the DAG, there's more restrictions and the 
paths are longer, so a sliver more than O(n) is expected.

Anyway, this looks ready for a preliminary implementation and some more 
serious calculations.

One more interesting thing: the heap heads are sorted, so when 
searching, the heap corresponding to the searched item can be found 
using binary search. That makes that part of the search essentially 
negligible - the lion's share will be the linear search on the last 
mile. In turn, that suggests that more heaps that are smaller would be a 
better choice. (At an extreme, if we have an array of heaps each 
proportional to log(n), then we get search time O(log n) even though the 
array is not entirely sorted.)

Andrei