random k-sample of a file

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Thu Oct 9 18:25:11 PDT 2008


Andrei Alexandrescu wrote:
> Carlos wrote:
>> : You can't do a uniform random distribution without knowing the length.
>>
>> Probably true for other distribution.
>> Most certainly not true for uniform distribution (with a raisonable k)
>>
>> You can work on a subset of the file. Let say 1000 records.
>> The distribution being uniform, you can select (or eliminate),
>> a percentage of each subset and the results for the whole file
>> will be ok.
> 
> I think you can do even nonuniform distributions. The number of samples 
> seen should not influence your subsampling decision.

I think "the number of total samples..." is the correct statement.

Andrei



More information about the Digitalmars-d mailing list