random k-sample of a file
bearophile
bearophileHUGS at lycos.com
Thu Oct 9 12:56:41 PDT 2008
Third solution, this requires a storage of k lines (but you can keep this storage on disk):
from sys import argv
from random import random, randrange
# randrange gives a random integer in [0, n)
filename = argv[1]
k = int(argv[2])
assert k > 0
chosen_lines = []
for i, line in enumerate(file(filename)):
if i < k:
chosen_lines.append(line)
else:
if random() < (1.0 / (i+1)):
chosen_lines[randrange(k)] = line
print chosen_lines
Now I'll look for possible bugs in this third version and to the problem Andrei has just told me :-)
Bye,
bearophile
More information about the Digitalmars-d
mailing list