An interesting data structure with search time O(sqrt n)

Andrei Alexandrescu via Digitalmars-d digitalmars-d at puremagic.com
Mon Nov 30 12:13:11 PST 2015


Okasaki's book is a continued inspiration of data structures and algorithms.

I was thinking along the following lines: typical collections are 
searchable in linear time. Then we have highly structured collections 
that feature logarithmic search time. But there seems to be nothing in 
between. So I was thinking, what would be a data structure that allows 
O(sqrt n) search times?

After a little tinkering, here's what I came up with.

Start with a simple array of data. Then mentally decompose that array 
into a concatenation of smaller arrays: first has size 1, second has 
size 4, third has size 9, then 16, 25, 36, ... Generally the size of 
these imaginary subarrays grows quadratically. And they're adjacent to 
each other. The last array may be incomplete.

Example: we decompose an array of size 35 into: an array of size 1 
followed by arrays of sizes 4, 9, 16, 5 (last being an incomplete 
fragment of an array of size 25).

Now each of these arrays we organize as a max heap. Moreover, we arrange 
data such that the maximums of these heaps are in INCREASING order. That 
means the smallest element of the entire (initial) array is at the first 
position, then followed by the next 4 smallest organized in a max heap, 
followed by the next 9 smallest organized in a max heap, ... etc. There 
are a total of O(sqrt n) heaps, and each has O(sqrt n) elements.

That's the layout! Now, to search we look at one heap at a time. 
Whenever the maximum element (first element in the subarray) is smaller 
than the searched value, we skip over that entire heap and go to the 
next one. In the worst case we'll skip O(sqrt n) heaps. When the max 
value in a heap is less than the searched element, we found the heap and 
we run a linear search among O(sqrt n) elements.

The structure is nice for sorting and in particular parallel sorting 
because each subarray can be sorted independently - there's no migration 
into or out of each subarray. Inside each subarray, of course heapsort 
would be a good choice because it takes advantage of the existing max 
heap. 	

I haven't worked out the math for insertion and deletion yet, but they 
seem to also be around O(sqrt n) or O(log(n) * sqrt(n)). Just wanted to 
share with you and discuss what seems to be an interesting data structure.

Please share any thoughts!


Andrei


More information about the Digitalmars-d mailing list