An interesting data structure with search time O(sqrt n)

Mon Nov 30 13:29:00 PST 2015

On Mon, Nov 30, 2015 at 03:57:24PM -0500, Andrei Alexandrescu via Digitalmars-d wrote:
> On 11/30/15 3:20 PM, H. S. Teoh via Digitalmars-d wrote:
> >On Mon, Nov 30, 2015 at 03:13:11PM -0500, Andrei Alexandrescu via Digitalmars-d wrote:
> >>Okasaki's book is a continued inspiration of data structures and
> >>algorithms.
> >>
> >>I was thinking along the following lines: typical collections are
> >>searchable in linear time. Then we have highly structured
> >>collections that feature logarithmic search time. But there seems to
> >>be nothing in between. So I was thinking, what would be a data
> >>structure that allows O(sqrt n) search times?
> >>
> >>After a little tinkering, here's what I came up with.
> >
> >Interesting indeed.
> >
> >It leaves me wondering, though, what's the point of having an O(sqrt
> >n) collection? What are the advantages?  Why would I use this
> >structure instead of, say, a traditional array heap with O(log n)
> >search time?
> 
> (Heaps offer only linear search time. You may take advantage of the
> heap structure to skip portions of the array, but on average and in
> the worst case the search is still O(n). So I assume you meant "sorted
> array or one of the classic search trees".)

Right, I wrote that without thinking. :-P

> The motivation starts with a desire to use arrays as the fundamental
> layout.  There have been many trends in computing as of late, among
> which: cache hierarchies are here to stay and contiguous layout is
> preferable.
> 
> The short of it is, arrays are king. No two ways about it - following
> pointers is a losing strategy when there's an alternative. A variety
> of new data structures (Clojure's arrays, heaps with tombstones) avoid
> classic pointer-based data structures in favor of adding structure on
> top of arrays.

I'm not so sure about that. At the micro level, yes, following pointers
for a tree / graph / etc., that could easily fit in a small/medium sized
array is a losing strategy, due to cache misses, hardware prefetchers,
etc.. When you're dealing with larger amounts of data, though, things
become less clear-cut. If your array size is on the order of MB's or
GB's, pointers start looking a lot more attractive. IMO in the long run
what will win is data structures that use tree or pointer based
implementations in the large scale, but built on cache-friendly
array-based structures below a certain level of granularity.

I agree with you, though, that array-based structures of intermediate
big-O complexities are very interesting. If you can come up with an
array structure that has the same complexity for all common operations,
that could be useful as a quick-fix drop in solution in cases where top
performance isn't required, but you'd like something better than O(n)
array scanning.

T

-- 
Philosophy: how to make a career out of daydreaming.