random k-sample of a file

bearophile bearophileHUGS at lycos.com
Thu Oct 9 12:29:38 PDT 2008


I am not reading the anwers written by others, of course :-) With the help of "Programming Pearls" here is my second version, that spares the memory required for the chosen ones, so this code runs with very little memory:

from sys import argv
from random import random

filename = argv[1]
k = int(argv[2])
nlines = sum(1 for _ in file(filename))

if k >= nlines:
    for line in file(filename):
        print line
else:
    select = k
    remaining = nlines

    for line in file(filename):
        if random() < float(select) / remaining:
            print line
            select -= 1
        remaining -= 1

I'll think for a solution that avoids reading the file twice then...

Bye,
bearophile



More information about the Digitalmars-d mailing list