random k-sample of a file
Jason House
jason.james.house at gmail.com
Thu Oct 9 12:55:03 PDT 2008
Andrei Alexandrescu Wrote:
> I just ran across a nice little problem that I wanted to share as an
> exercise for the interested in futile pastimes.
>
> The challenge, should you accept it, is to write a program that given a
> number k and a file, outputs k lines picked uniformly at random from the
> file. (If the file has less than k lines, it should all be output.) The
> file's size is unbounded so loading it in memory is not an option. How'd
> you go about it?
>
>
> Andrei
Based on your later stated requirements, use a heap. Insert each line into the heap using a random number as the key. After pre-loading the first k elements, each insert into the heap is followed by the removal of the head element.
Performance is O(characters in file + log2(k)*lines in file)
Obviously, the random number generator needs a range of values that exceeds the lines in the file by a good margin, or else you'll get biases in what gets kept.
More information about the Digitalmars-d
mailing list