An interesting data structure with search time O(sqrt n)
Navin via Digitalmars-d
digitalmars-d at puremagic.com
Tue Dec 1 21:58:24 PST 2015
On Tuesday, 1 December 2015 at 22:48:56 UTC, Andrei Alexandrescu
wrote:
> On 12/01/2015 12:13 AM, Andrei Alexandrescu wrote:
>> On 11/30/15 9:47 PM, Timon Gehr wrote:
>>> On 12/01/2015 03:33 AM, Timon Gehr wrote:
>>>> On 11/30/2015 09:57 PM, Andrei Alexandrescu wrote:
>>>>> So now consider my square heaps. We have O(n) build time
>>>>> (just a bunch
>>>>> of heapifications) and O(sqrt n) search.
>>>>
>>>> How do you build in O(n)? (The initial array is assumed to
>>>> be completely
>>>> unordered, afaict.)
>>>
>>> (I meant to say: There aren't any assumptions on the initial
>>> ordering of
>>> the array elements.)
>>
>> That's quite challenging. (My O(n) estimate was off the cuff
>> and
>> possibly wrong.) Creating the structure entails simultaneously
>> solving
>> the selection problem (find the k smallest elements) for
>> several values
>> of k. I'll post here if I come up with something. -- Andrei
>
> OK, I think I have an answer to this in the form of an
> efficient algorithm.
>
> First off: sizes 1+3+5+7+... seem a great choice, I'll use that
> for the initial implementation (thanks Titus!).
>
> Second: the whole max heap is a red herring - min heap is just
> as good, and in fact better. When doing the search just
> overshoot by one then go back one heap to the left and do the
> final linear search in there.
>
> So the structure we're looking at is an array of adjacent
> min-heaps of sizes 1, 3, 5, etc. The heaps are ordered (the
> maximum of heap k is less than or equal to the minimum of heap
> k+1). Question is how do we build such an array of heaps in
> place starting from an unstructured array of size n.
>
> One simple approach is to just sort the array in O(n log n).
> This satisfies all properties - all adjacent subsequences are
> obviously ordered, and any subsequence has the min heap
> property. As an engineering approach we may as well stop here -
> sorting is a widely studied and well implemented algorithm.
> However, we hope to get away with less work because we don't
> quite need full sorting.
>
> Here's the intuition: the collection of heaps can be seen as
> one large heap that has a DAG structure (as opposed to a tree).
> In the DAG, the root of heap k+1 is the child of all leaves of
> heap k (see http://imgur.com/I366GYS which shows the DAG for
> the 1, 3, 7, and 7 heaps).
>
> Clearly getting this structure to respect the heap property is
> all that's needed for everything to work - so we simply apply
> the classic heapify algorithm to it. It seems it can be applied
> almost unchanged - starting from the end, sift each element
> down the DAG.
>
> This looks efficient and minimal; I doubt there's any redundant
> work. However, getting bounds for complexity of this will be
> tricky. Classic heapify is tricky, too - it seems to have
> complexity O(n log n) but in fact has complexity O(n) - see
> nice discussion at
> http://stackoverflow.com/questions/9755721/how-can-building-a-heap-be-on-time-complexity. When applying heapify to the DAG, there's more restrictions and the paths are longer, so a sliver more than O(n) is expected.
>
> Anyway, this looks ready for a preliminary implementation and
> some more serious calculations.
>
> One more interesting thing: the heap heads are sorted, so when
> searching, the heap corresponding to the searched item can be
> found using binary search. That makes that part of the search
> essentially negligible - the lion's share will be the linear
> search on the last mile. In turn, that suggests that more heaps
> that are smaller would be a better choice. (At an extreme, if
> we have an array of heaps each proportional to log(n), then we
> get search time O(log n) even though the array is not entirely
> sorted.)
>
>
> Andrei
Nice to see this interesting post and learn.
I have a few questions.
1) This is offline datastructure since you don't know how the
elements of the future are going to be ie dynamic. ie later
elements from n to 2n can break or change your heaps as such in
worst case or is it a dynamic data structure ?
2) Searching in min or max heaps is bad isn't it ? Lets say we
choose max heaps. Now we have the root as 10^9 in the second last
heap ie around n^2 elements. The children of it are 4*10^8 and
5*10^8 . If i'm searching for say 4.5 *10^8 my job is easy but if
i'm searching for 1000, i have to search in both the subtrees and
it goes linear and becomes around n^2 in the worst case. Did i
overlook anything ?
Instead of heaps, a single sorted array or breaking them into a
series of sorted arrays ie skip lists kind of stuff would be
fine if it just want a offline Data structure ?
or is this some domain specific data Structure where you only/cre
want max/min in some sequence ?
More information about the Digitalmars-d
mailing list