An interesting data structure with search time O(sqrt n)

Tue Dec 1 21:58:24 PST 2015

On Tuesday, 1 December 2015 at 22:48:56 UTC, Andrei Alexandrescu 
wrote:
> On 12/01/2015 12:13 AM, Andrei Alexandrescu wrote:
>> On 11/30/15 9:47 PM, Timon Gehr wrote:
>>> On 12/01/2015 03:33 AM, Timon Gehr wrote:
>>>> On 11/30/2015 09:57 PM, Andrei Alexandrescu wrote:
>>>>> So now consider my square heaps. We have O(n) build time 
>>>>> (just a bunch
>>>>> of heapifications) and O(sqrt n) search.
>>>>
>>>> How do you build in O(n)? (The initial array is assumed to 
>>>> be completely
>>>> unordered, afaict.)
>>>
>>> (I meant to say: There aren't any assumptions on the initial 
>>> ordering of
>>> the array elements.)
>>
>> That's quite challenging. (My O(n) estimate was off the cuff 
>> and
>> possibly wrong.) Creating the structure entails simultaneously 
>> solving
>> the selection problem (find the k smallest elements) for 
>> several values
>> of k. I'll post here if I come up with something. -- Andrei
>
> OK, I think I have an answer to this in the form of an 
> efficient algorithm.
>
> First off: sizes 1+3+5+7+... seem a great choice, I'll use that 
> for the initial implementation (thanks Titus!).
>
> Second: the whole max heap is a red herring - min heap is just 
> as good, and in fact better. When doing the search just 
> overshoot by one then go back one heap to the left and do the 
> final linear search in there.
>
> So the structure we're looking at is an array of adjacent 
> min-heaps of sizes 1, 3, 5, etc. The heaps are ordered (the 
> maximum of heap k is less than or equal to the minimum of heap 
> k+1). Question is how do we build such an array of heaps in 
> place starting from an unstructured array of size n.
>
> One simple approach is to just sort the array in O(n log n). 
> This satisfies all properties - all adjacent subsequences are 
> obviously ordered, and any subsequence has the min heap 
> property. As an engineering approach we may as well stop here - 
> sorting is a widely studied and well implemented algorithm. 
> However, we hope to get away with less work because we don't 
> quite need full sorting.
>
> Here's the intuition: the collection of heaps can be seen as 
> one large heap that has a DAG structure (as opposed to a tree). 
> In the DAG, the root of heap k+1 is the child of all leaves of 
> heap k (see http://imgur.com/I366GYS which shows the DAG for 
> the 1, 3, 7, and 7 heaps).
>
> Clearly getting this structure to respect the heap property is 
> all that's needed for everything to work - so we simply apply 
> the classic heapify algorithm to it. It seems it can be applied 
> almost unchanged - starting from the end, sift each element 
> down the DAG.
>
> This looks efficient and minimal; I doubt there's any redundant 
> work. However, getting bounds for complexity of this will be 
> tricky. Classic heapify is tricky, too - it seems to have 
> complexity O(n log n) but in fact has complexity O(n) - see 
> nice discussion at 
> http://stackoverflow.com/questions/9755721/how-can-building-a-heap-be-on-time-complexity. When applying heapify to the DAG, there's more restrictions and the paths are longer, so a sliver more than O(n) is expected.
>
> Anyway, this looks ready for a preliminary implementation and 
> some more serious calculations.
>
> One more interesting thing: the heap heads are sorted, so when 
> searching, the heap corresponding to the searched item can be 
> found using binary search. That makes that part of the search 
> essentially negligible - the lion's share will be the linear 
> search on the last mile. In turn, that suggests that more heaps 
> that are smaller would be a better choice. (At an extreme, if 
> we have an array of heaps each proportional to log(n), then we 
> get search time O(log n) even though the array is not entirely 
> sorted.)
>
>
> Andrei

Nice to see this interesting post and learn.

  I have a few questions.

1) This is offline datastructure since you don't know how the 
elements of the future are going to be ie dynamic. ie later 
elements from n to 2n can break or change your heaps as such in 
worst case or is it a dynamic data structure ?

2)  Searching in min or max heaps is bad isn't it ? Lets say we 
choose max heaps. Now we have the root as 10^9 in the second last 
heap ie around n^2 elements.  The children of it are 4*10^8 and 
5*10^8 . If i'm searching for say 4.5 *10^8 my job is easy but if 
i'm searching for 1000, i have to search in both the subtrees and 
it goes linear and becomes around n^2 in the worst case. Did i 
overlook anything ?

Instead of heaps, a single sorted array or breaking them into a 
series of sorted arrays ie skip lists kind of stuff  would be 
fine if it just want a offline Data structure ?

or is this some domain specific data Structure where you only/cre 
want max/min in some sequence ?