random k-sample of a file
bearophile
bearophileHUGS at lycos.com
Thu Oct 9 12:29:38 PDT 2008
I am not reading the anwers written by others, of course :-) With the help of "Programming Pearls" here is my second version, that spares the memory required for the chosen ones, so this code runs with very little memory:
from sys import argv
from random import random
filename = argv[1]
k = int(argv[2])
nlines = sum(1 for _ in file(filename))
if k >= nlines:
for line in file(filename):
print line
else:
select = k
remaining = nlines
for line in file(filename):
if random() < float(select) / remaining:
print line
select -= 1
remaining -= 1
I'll think for a solution that avoids reading the file twice then...
Bye,
bearophile
More information about the Digitalmars-d
mailing list