random k-sample of a file

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Thu Oct 9 13:04:20 PDT 2008


Steven Schveighoffer wrote:
> "Andrei Alexandrescu" wrote
>> Walter Bright wrote:
>>> Andrei Alexandrescu wrote:
>>>> The challenge, should you accept it, is to write a program that given a 
>>>> number k and a file, outputs k lines picked uniformly at random from the 
>>>> file. (If the file has less than k lines, it should all be output.) The 
>>>> file's size is unbounded so loading it in memory is not an option. How'd 
>>>> you go about it?
>>> 1. read the file, counting lines => nlines
>>> 2. select k random numbers 0..nlines
>>> 3. read the file line by line again, outputting the line if it's one of 
>>> the selected lines
>> No go if the file is a stream. Can it be done in one pass?
> 
> Any bounds on the input k?  If not, then no.  In that case, you must do one 
> pass to count the lines, otherwise, you would end up possibly throwing away 
> lines that needed to be output to satisfy k.

k is a "small" number in the sense that you can keep something 
proportional to k in working memory. You shouldn't keep something 
proportional to n in working memory.

Andrei



More information about the Digitalmars-d mailing list