random k-sample of a file

Steven Schveighoffer schveiguy at yahoo.com
Thu Oct 9 12:57:03 PDT 2008


"Andrei Alexandrescu" wrote
> Walter Bright wrote:
>> Andrei Alexandrescu wrote:
>>> The challenge, should you accept it, is to write a program that given a 
>>> number k and a file, outputs k lines picked uniformly at random from the 
>>> file. (If the file has less than k lines, it should all be output.) The 
>>> file's size is unbounded so loading it in memory is not an option. How'd 
>>> you go about it?
>>
>> 1. read the file, counting lines => nlines
>> 2. select k random numbers 0..nlines
>> 3. read the file line by line again, outputting the line if it's one of 
>> the selected lines
>
> No go if the file is a stream. Can it be done in one pass?

Any bounds on the input k?  If not, then no.  In that case, you must do one 
pass to count the lines, otherwise, you would end up possibly throwing away 
lines that needed to be output to satisfy k.

-Steve 





More information about the Digitalmars-d mailing list