random k-sample of a file

bearophile bearophileHUGS at lycos.com
Thu Oct 9 12:56:41 PDT 2008


Third solution, this requires a storage of k lines (but you can keep this storage on disk):

from sys import argv
from random import random, randrange
# randrange gives a random integer in [0, n)

filename = argv[1]
k = int(argv[2])
assert k > 0

chosen_lines = []
for i, line in enumerate(file(filename)):
    if i < k:
        chosen_lines.append(line)
    else:
        if random() < (1.0 / (i+1)):
            chosen_lines[randrange(k)] = line

print chosen_lines

Now I'll look for possible bugs in this third version and to the problem Andrei has just told me :-)

Bye,
bearophile



More information about the Digitalmars-d mailing list