random k-sample of a file

Jason House jason.james.house at gmail.com
Thu Oct 9 12:55:03 PDT 2008


Andrei Alexandrescu Wrote:

> I just ran across a nice little problem that I wanted to share as an 
> exercise for the interested in futile pastimes.
> 
> The challenge, should you accept it, is to write a program that given a 
> number k and a file, outputs k lines picked uniformly at random from the 
> file. (If the file has less than k lines, it should all be output.) The 
> file's size is unbounded so loading it in memory is not an option. How'd 
> you go about it?
> 
> 
> Andrei


Based on your later stated requirements, use a heap.  Insert each line into the heap using a random number as the key.  After pre-loading the first k elements, each insert into the heap is followed by the removal of the head element.

Performance is O(characters in file + log2(k)*lines in file)

Obviously, the random number generator needs a range of values that exceeds the lines in the file by a good margin, or else you'll get biases in what gets kept.





More information about the Digitalmars-d mailing list