random k-sample of a file
Steven Schveighoffer
schveiguy at yahoo.com
Thu Oct 9 12:57:03 PDT 2008
"Andrei Alexandrescu" wrote
> Walter Bright wrote:
>> Andrei Alexandrescu wrote:
>>> The challenge, should you accept it, is to write a program that given a
>>> number k and a file, outputs k lines picked uniformly at random from the
>>> file. (If the file has less than k lines, it should all be output.) The
>>> file's size is unbounded so loading it in memory is not an option. How'd
>>> you go about it?
>>
>> 1. read the file, counting lines => nlines
>> 2. select k random numbers 0..nlines
>> 3. read the file line by line again, outputting the line if it's one of
>> the selected lines
>
> No go if the file is a stream. Can it be done in one pass?
Any bounds on the input k? If not, then no. In that case, you must do one
pass to count the lines, otherwise, you would end up possibly throwing away
lines that needed to be output to satisfy k.
-Steve
More information about the Digitalmars-d
mailing list