random k-sample of a file

Denis Koroskin 2korden at gmail.com
Thu Oct 9 12:19:02 PDT 2008


On Thu, 09 Oct 2008 23:16:38 +0400, Ary Borenszweig <ary at esperanto.org.ar>  
wrote:

> Walter Bright wrote:
>> Andrei Alexandrescu wrote:
>>> The challenge, should you accept it, is to write a program that given  
>>> a number k and a file, outputs k lines picked uniformly at random from  
>>> the file. (If the file has less than k lines, it should all be  
>>> output.) The file's size is unbounded so loading it in memory is not  
>>> an option. How'd you go about it?
>>  1. read the file, counting lines => nlines
>> 2. select k random numbers 0..nlines
>> 3. read the file line by line again, outputting the line if it's one of  
>> the selected lines
>
> Can't this be done in one single pass?
>
> My thoughts were:
>
> 1. read the first k lines and assign them to the ones to return (the  
> "stored lines")
>

What if there is only one line in a file? Andrei told that

> The file's size is unbounded so loading it in memory is not an option.



More information about the Digitalmars-d mailing list