An interesting data structure with search time O(sqrt n)

Andrei Alexandrescu via Digitalmars-d digitalmars-d at puremagic.com
Mon Nov 30 12:57:24 PST 2015


On 11/30/15 3:20 PM, H. S. Teoh via Digitalmars-d wrote:
> On Mon, Nov 30, 2015 at 03:13:11PM -0500, Andrei Alexandrescu via Digitalmars-d wrote:
>> Okasaki's book is a continued inspiration of data structures and
>> algorithms.
>>
>> I was thinking along the following lines: typical collections are
>> searchable in linear time. Then we have highly structured collections
>> that feature logarithmic search time. But there seems to be nothing in
>> between. So I was thinking, what would be a data structure that allows
>> O(sqrt n) search times?
>>
>> After a little tinkering, here's what I came up with.
>
> Interesting indeed.
>
> It leaves me wondering, though, what's the point of having an O(sqrt n)
> collection? What are the advantages?  Why would I use this structure
> instead of, say, a traditional array heap with O(log n) search time?

(Heaps offer only linear search time. You may take advantage of the heap 
structure to skip portions of the array, but on average and in the worst 
case the search is still O(n). So I assume you meant "sorted array or 
one of the classic search trees".)

The motivation starts with a desire to use arrays as the fundamental 
layout. There have been many trends in computing as of late, among 
which: cache hierarchies are here to stay and contiguous layout is 
preferable.

The short of it is, arrays are king. No two ways about it - following 
pointers is a losing strategy when there's an alternative. A variety of 
new data structures (Clojure's arrays, heaps with tombstones) avoid 
classic pointer-based data structures in favor of adding structure on 
top of arrays.

So now if we consider thinking, "how do we organize an array for good 
search performance" we have a spectrum. Generally we also care about 
insertion and removal.

At one end of the spectrum there's doing nothing at all - that means 
O(1) build (nothing to do), O(n) search, O(1) insert (just append it), 
O(n) removal. Not very nice.

At the other end, the absolute best structuring on top of an array for 
search is sorting. With sorting you get great O(log n) search 
performance. But the others are not nice - O(n log n) build, O(n) add, 
O(n) remove.

So now consider my square heaps. We have O(n) build time (just a bunch 
of heapifications) and O(sqrt n) search. Then (again I haven't worked 
out the math yet) let's assume insertion and removal are both O(sqrt n). 
Then you get something less structured than full sorting, but also just 
good enough to offer the same complexity for each of search, insert, and 
delete. That would be pretty neat.


Andrei



More information about the Digitalmars-d mailing list