random cover of a range

Sat Feb 14 04:14:18 PST 2009

On Sat, Feb 14, 2009 at 8:28 PM, bearophile <bearophileHUGS at lycos.com> wrote:
> Andrei Alexandrescu:
>>Well I don't buy it. If you make a point, you need to be more precise than such hand-waving.<
>
> You are of right, but I am not able to give you good math here.
> (I have used that algorithm to extract all the possible uints, in few minutes and with less than 1GB of RAM. That, plus my experience with hashes, tells me that this algorithm is good enough. If you want I can also implement it agan and test it with a growing number of items, to see from the graph of the timings if it's indeed linear.)
>
>
> Bill B.:
>>.. the real gotcha is that he's using O(n) extra memory. 10% of n is still O(n) memory for shuffling the last 10% of the items.<
>
> No, you are wrong, in real computers using 110% of the memory needed to store your n items is generally acceptable.

Yah, I just meant it was a big-oh gotcha.  Though I guess now that I
think of it the bit array is also using O(N) extra memory, too.
I agree that big oh doesn't always give you a very good picture of how
an algorithm will perform in practice.  Big-oh completely ignores very
important aspects like cache effects and physical memory limitations
on practical problem sizes.

I found a pdf[1] that has a summary of the probabilities associated
with finding empty slots in the context of hashing.
If I'm reading it right, if you have a load factor (fraction full) of
f, then it's expected it will take you 1/(1-f) trials to get an empty
if you just continue picking randomly one after the other.

--bb
[1] http://www.cse.unsw.edu.au/~cs2011/lect/2711_HashProbe-4.pdf