random cover of a range

Fri Feb 13 21:58:06 PST 2009

On Sat, Feb 14, 2009 at 2:20 PM, Andrei Alexandrescu
<SeeWebsiteForEmail at erdani.org> wrote:
> Bill Baxter wrote:
>>
>> On Sat, Feb 14, 2009 at 1:03 PM, Andrei Alexandrescu
>> <SeeWebsiteForEmail at erdani.org <mailto:SeeWebsiteForEmail at erdani.org>>
>> wrote:
>>
>>    bearophile wrote:
>>
>>        Andrei Alexandrescu:
>>
>>            Say at some point there are k available (not taken) slots out
>> of
>>            "n". There is a k/n chance that a random selection finds an
>>            unoccupied slot. The average number of random trials needed
>>            to find
>>            an unoccupied slot is proportional to 1/(k/n) = n/k. So the
>>            total
>>            number of random trials to span the entire array is quadratic.
>>            Multiplying that by 0.9 leaves it quadratic.
>>
>>
>>        It's like in hashing: if you want to fill 99% of the available
>> space
>>        in a hash, then you take ages to find empty slots. But if you
>>        fill it
>>        only at 75-90%, then on average you need only one or two tries to
>>        find an empty slot. So your time is linear, with a small
>>        multiplicative constant. When the slots start to get mostly
>>        full, you
>>        change algorithm, copying the empty slots elsewhere.
>>
>>
>>    Well I don't buy it. If you make a point, you need to be more
>>    precise than such hand-waving. It's not like in hashing. It's like
>>    in the algorithm we discuss. If you make a clear point that your
>>    performance is better than O(n*n) by stopping at 90% then make it. I
>>    didn't go through much formalism, but my napkin says you're firmly
>>    in quadratic territory.
>>
>>  Well he has a point that the number of trials required to find an empty
>> depends not on the absolute number of empty items, but only the ratio of
>> empties to fulls.   Even your own claim about average number of trials was
>> n/k -- not sure how you got that though.
>
> If you toss a N-side dice hoping for a specific face to show up (and
> stopping afterwards), how many times do you have to toss it on average? I
> recall (without being sure) that you need to toss it a number of times
> proportional to N. Could anyone confirm or deny?

The probability of *not* getting the number after t tries is
(1-1/n)^t, so it's just a pure decaying exponential, the chance of
getting it after t tries should be 1 minus that.  The average number
of tries is going to be some kind of integral of that curve, and an
integral of an exponential is still an exponential, so it seems
unlikely to me that your memory was correct on this one.

> Now, assuming the above is true, say we decide to do the linear search for a
> fraction f of the n elements in the array.

Maybe you were switching subjects (to your algorithm?) here, but just
to be clear, Bearophile didn't say anything about doing a linear
search.  His first phase is just to do repeated dart-throwing trials
till an empty is found.

>. The average number of steps will
> be:
>
> S = n * (1/n + 1/(n - 1) + 1/(n - 2) + ... + 1/(n - fn))
>
> So we're looking for the harmonic series that does not start from 1, but
> instead starts from its fn'th term (you drop the first (1-f)n terms). Does
> it converge? (I don't know.)

In the limit of what?  You seem to be asking if a finite series converges.

--bb