random cover of a range

Fri Feb 13 21:20:26 PST 2009

Bill Baxter wrote:
> On Sat, Feb 14, 2009 at 1:03 PM, Andrei Alexandrescu 
> <SeeWebsiteForEmail at erdani.org <mailto:SeeWebsiteForEmail at erdani.org>> 
> wrote:
> 
>     bearophile wrote:
> 
>         Andrei Alexandrescu:
> 
>             Say at some point there are k available (not taken) slots out of
>             "n". There is a k/n chance that a random selection finds an
>             unoccupied slot. The average number of random trials needed
>             to find
>             an unoccupied slot is proportional to 1/(k/n) = n/k. So the
>             total
>             number of random trials to span the entire array is quadratic.
>             Multiplying that by 0.9 leaves it quadratic.
> 
> 
>         It's like in hashing: if you want to fill 99% of the available space
>         in a hash, then you take ages to find empty slots. But if you
>         fill it
>         only at 75-90%, then on average you need only one or two tries to
>         find an empty slot. So your time is linear, with a small
>         multiplicative constant. When the slots start to get mostly
>         full, you
>         change algorithm, copying the empty slots elsewhere.
> 
> 
>     Well I don't buy it. If you make a point, you need to be more
>     precise than such hand-waving. It's not like in hashing. It's like
>     in the algorithm we discuss. If you make a clear point that your
>     performance is better than O(n*n) by stopping at 90% then make it. I
>     didn't go through much formalism, but my napkin says you're firmly
>     in quadratic territory.
> 
>  
> Well he has a point that the number of trials required to find an empty 
> depends not on the absolute number of empty items, but only the ratio of 
> empties to fulls.   Even your own claim about average number of trials 
> was n/k -- not sure how you got that though.

If you toss a N-side dice hoping for a specific face to show up (and 
stopping afterwards), how many times do you have to toss it on average? 
I recall (without being sure) that you need to toss it a number of times 
proportional to N. Could anyone confirm or deny?

Now, assuming the above is true, say we decide to do the linear search 
for a fraction f of the n elements in the array. The average number of 
steps will be:

S = n * (1/n + 1/(n - 1) + 1/(n - 2) + ... + 1/(n - fn))

So we're looking for the harmonic series that does not start from 1, but 
instead starts from its fn'th term (you drop the first (1-f)n terms). 
Does it converge? (I don't know.)

Andrei