random k-sample of a file

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Thu Oct 9 18:21:02 PDT 2008


Carlos wrote:
> : You can't do a uniform random distribution without knowing the length.
> 
> Probably true for other distribution.
> Most certainly not true for uniform distribution (with a raisonable k)
> 
> You can work on a subset of the file. Let say 1000 records.
> The distribution being uniform, you can select (or eliminate),
> a percentage of each subset and the results for the whole file
> will be ok.

I think you can do even nonuniform distributions. The number of samples 
seen should not influence your subsampling decision.

Andrei



More information about the Digitalmars-d mailing list