random k-sample of a file

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Thu Oct 9 12:21:20 PDT 2008


Denis Koroskin wrote:
> On Thu, 09 Oct 2008 23:16:38 +0400, Ary Borenszweig 
> <ary at esperanto.org.ar> wrote:
> 
>> Walter Bright wrote:
>>> Andrei Alexandrescu wrote:
>>>> The challenge, should you accept it, is to write a program that 
>>>> given a number k and a file, outputs k lines picked uniformly at 
>>>> random from the file. (If the file has less than k lines, it should 
>>>> all be output.) The file's size is unbounded so loading it in memory 
>>>> is not an option. How'd you go about it?
>>>  1. read the file, counting lines => nlines
>>> 2. select k random numbers 0..nlines
>>> 3. read the file line by line again, outputting the line if it's one 
>>> of the selected lines
>>
>> Can't this be done in one single pass?
>>
>> My thoughts were:
>>
>> 1. read the first k lines and assign them to the ones to return (the 
>> "stored lines")
>>
> 
> What if there is only one line in a file? Andrei told that
> 
>> The file's size is unbounded so loading it in memory is not an option.

Keeping k of the lines in memory is fine.

Andrei



More information about the Digitalmars-d mailing list