random k-sample of a file

BCS ao at pathlink.com
Thu Oct 9 12:30:25 PDT 2008


Reply to Andrei,

> Walter Bright wrote:
> 
>> Andrei Alexandrescu wrote:
>> 
>>> The challenge, should you accept it, is to write a program that
>>> given a number k and a file, outputs k lines picked uniformly at
>>> random from the file. (If the file has less than k lines, it should
>>> all be output.) The file's size is unbounded so loading it in memory
>>> is not an option. How'd you go about it?
>>> 
>> 1. read the file, counting lines => nlines
>> 2. select k random numbers 0..nlines
>> 3. read the file line by line again, outputting the line if it's one
>> of
>> the selected lines
> No go if the file is a stream. Can it be done in one pass?
> 
> Andrei
> 

read and store the first k lines with there line number, 
if no lines remain, output stored lines
read another line 
replace an existing line with a probability based on the current line and 
the replaced line's location
if no lines remain ...


Now all that remains is to figure out that function





More information about the Digitalmars-d mailing list