random k-sample of a file

Jason House jason.james.house at gmail.com
Thu Oct 9 14:46:33 PDT 2008


BCS Wrote:

> Reply to Andrei,
> 
> > I just ran across a nice little problem that I wanted to share as an
> > exercise for the interested in futile pastimes.
> > 
> > The challenge, should you accept it, is to write a program that given
> > a number k and a file, outputs k lines picked uniformly at random from
> > the file. (If the file has less than k lines, it should all be
> > output.) The file's size is unbounded so loading it in memory is not
> > an option. How'd you go about it?
> > 
> > Andrei
> > 
> 
> struct S
> {
>  char[] line;
>  real v;
>  int opCmp(S s)
>  {
>   if(s.v < this.v) return -1;
>   else if(s.v > this.v) return +1;
>   return 0;
>  }
> }
> 
> int k;
> S[] buf = new S[k+1];
> 
> foreach(s;buf) s.v = real.max;
> 
> foreach(char[] line; stream)
> {
>     delete buf[$-1].line;
>     buf[$-1].line = line.dup;;
>     buf[$-1].v = reaqlRand();  // returns v > real.min and v < real.max
>     buf.sort;
> }
> 
> foreach(s;buf) if(s.v != real.max; writef("%s\n", s.line);
> 
> 

You should check if your new random number is good enough to keep before doing the array manipulation and string dup... That small change should yield an near-optimal big O. A heap should give slightly faster performance for moderate values of k.



More information about the Digitalmars-d mailing list