randomSample

BCS none at anon.com
Sat May 23 18:19:54 PDT 2009


Hello Andrei,

> I'm writing a range to select a random sampling of a forward range.
> The number of elements in the forward range is already known.
> 
> The strategy is: at any moment, I know I need to select k random
> samples out of n. To implement popFront, I generate a random number in
> [0, n - k]. That's the probability that the next item in the input
> will be part of the selection. If the random number is 0, then the
> element got selected. Otherwise, I must ignore that guy so I popFront
> the input range, decrease n by one, and repeat.
> 
> This seems reasonable but somehow the selection comes skewed: the
> elements towards the beginning of the range are less represented than
> the ones towards the end. Where am I wrong?
> 
> Andrei
> 

Check my reasoning on this:

It will be skewed because as soon as the system gets behind (and statistically 
it will half the time) the elements start being selected with a higher probability.

Take the related case, select each element with equal probability until you 
have enough selected or have just enough left to get the correct amount. 
The last element will get selected about half the time no matter what percentage 
you want.





More information about the Digitalmars-d mailing list