random k-sample of a file
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Thu Oct 9 13:30:41 PDT 2008
bearophile wrote:
> Third solution, this requires a storage of k lines (but you can keep this storage on disk):
>
> from sys import argv
> from random import random, randrange
> # randrange gives a random integer in [0, n)
>
> filename = argv[1]
> k = int(argv[2])
> assert k > 0
>
> chosen_lines = []
> for i, line in enumerate(file(filename)):
> if i < k:
> chosen_lines.append(line)
> else:
> if random() < (1.0 / (i+1)):
> chosen_lines[randrange(k)] = line
>
> print chosen_lines
We have a winner!!! There is actually a very simple proof on how and why
this works.
Andrei
More information about the Digitalmars-d
mailing list