An interesting data structure with search time O(sqrt n)
Timon Gehr via Digitalmars-d
digitalmars-d at puremagic.com
Thu Dec 3 17:35:17 PST 2015
On 12/03/2015 11:44 PM, Andrei Alexandrescu wrote:
> On 12/02/2015 06:26 AM, Timon Gehr wrote:
>> Assume the initial array is sorted from largest to smallest. This will
>> be the worst case for this construction algorithm, as each element will
>> be sifted all the way down to leaf level of the last heap.
> [...]
>> Ω(n^(3/2)).
>
> Thanks. My math-fu is not good enough to properly follow the argument,
There was actually an (asymptotically inconsequential) error in the
initial expression. Fixed version:
Ignoring the work performed in the heaps each elements starts in (which
is obviously at most O(n)), we can lower bound the performed work; for
the heap of size j, we sift j elements down all the following heaps of
sizes i=j+2,i=j+4,…,i=m).
∑ⱼ[j∈{1,3,…,m}]·j·∑ᵢ[i∈{j+2,j+4,…,m}]·log(i)
=
∑ᵢ∑ⱼ[j∈{1,3,…,m}]·[i∈{j+2,j+4,…,m}]·j·log(i)
=
∑ᵢ∑ⱼ∑ₖ∑ₗ[i=2·k+1]·[j=2·l+1]·[1≤j≤m]·[j+2≤i≤m]·j·log(i)
=
∑ᵢ∑ₖ[i=2·k+1]·[3≤i≤m]·log(i)·∑ⱼ∑ₗ[j=2·l+1]·[1≤j≤m]·[j+2≤i]·j
=
∑ᵢ∑ₖ[i=2·k+1]·[3≤i≤m]·log(i)·∑ⱼ∑ₗ[j=2·l+1]·[1≤j≤i-2]·j
=
∑ᵢ∑ₖ[i=2·k+1]·[3≤i≤m]·log(i)·∑ⱼ∑ₗ[j=2·l+1]·[1≤2·l+1≤i-2]·(2·l+1)
=
∑ᵢ∑ₖ[i=2·k+1]·[3≤i≤m]·log(i)·∑ₗ[1≤2·l+1≤i-2]·(2·l+1)
=
∑ᵢ∑ₖ[i=2·k+1]·[3≤i≤m]·log(i)·∑ₗ[0≤l≤(i-3)/2]·(2·l+1)
=
∑ᵢ∑ₖ[i=2·k+1]·[3≤i≤m]·log(i)·∑ₗ[0≤l≤(i-3)/2]·(2·l+1)
=
∑ᵢ∑ₖ[i=2·k+1]·[3≤i≤m]·log(i)·((i-1)/2)²
=
∑ₖ[3≤2·k+1≤m]·log(2·k+1)·k²
=
∑ₖ[1≤k≤(m-1)/2]·log(2·k+1)·k²
≥
Ω(m³)
=
Ω(n^(3/2)).
Less formally:
The heap of size 3 is sifted through by 1=1² element from the first heap.
The heap of size 5 is sifted through by 1+3=2² elements from the first
and second heaps
The heap of size 7 is sifted through by 1+3+5=3² elements from the first
three heaps.
...
The heap of size 2·k+1 is sifted through by k² elements.
...
The heap of size m is sifted through by ((m-1)/2)² elements.
This gives the last sum in the above derivation:
sifting through heap of size 2·k+1 costs at least log(2·k+1)
v
∑ₖ[1≤k≤(m-1)/2]·log(2·k+1)·k²
^
k ranges from 1 to (m-1)/2.
The factor k² is the number of elements sifting through the heap of size
2·k+1.
The sum can be lower bounded by
1+2²+3²+…+((m-1)/2)²
=
(m+1)·(m-1)·(m-3)/24+(m+1)·(m-1)/8
=
(m³-m)/24
≥
Ω(m³).
> but that bound sounds high; one intuition is that there shouldn't be
> more work done than in sorting - fewer swaps, fewer comparisons, less
> resulting structure.
> ...
That intuition is spot on, but the sorting algorithm that the
construction algorithm imitates most closely is a version of insertion sort.
> For example, heapifying the array 1, 2, 3, 4, 5, 6, 7, 8, 9 in 1+3+5 max
> heaps results in the array 9, 8, 6, 7, 5, 4, 3, 2, 1 which is "less
> sorted" than the fully sorted array. (I know, that doesn't prove anything.)
> ...
It provides motivation to try and find another construction algorithm
that runs asymptotically faster than n log n, or to prove that n log n
is asymptotically optimal.
> At this point I need to either get to the bottom of the math or
The basic argument is not too involved: If each element needs to be
sifted all the way down to the last heap, then for each of √n̅ heaps, you
need to sift all its ~√n̅ elements down ~√n̅ other heaps, which requires
more than ≈n^(3/2) operations. The above derivation makes this rigorous.
> put together an experimental rig that counts comparisons and swaps.
>
> (BTW I figured one reason for which these DAG heaps work at all is that
> there's never a need to sift an element up between heaps; that would be
> inefficient because the root of a subheap has multiple parents. Sifting
> down is efficient because each node has at most two children.)
> ...
Good point. I had declared victory on the "insert" front too early.
More information about the Digitalmars-d
mailing list