random k-sample of a file
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Fri Oct 10 08:09:18 PDT 2008
bearophile wrote:
> Andrei Alexandrescu:
>>> Ah, ok. It works if you do that. So bearophile's code must say
>>>
>>> if random() > (1.0 / (i+1)):
>>>
>>> instead of
>>>
>>> if random() < (1.0 / (i+1)):
>>>
>>> (that is, if random() returns a real number between 0 and 1 with uniform
>>> distribution)
>> His prize gets retired. Now how does Kirk's code fare? :o)
>
> I don't like contests...
>
> This simple benchmark shows that my third version is wrong in both ways, and it keeps being wrong even if you remove the chosen.append(line), etc.
>
> The problem isn't easy, this paper shows that it's not easy to create an efficient implementation even when the total line count is known:
> http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.4335
No, that's a different problem. They want to extract n random samples
from N records *without replacement*. It's a very different ballgame.
See http://www.cs.duke.edu/~jsv/Papers/catalog/node139.html
The problem as I stated is reasonably easy and my implementation should
be correct.
Andrei
More information about the Digitalmars-d
mailing list