randomSample

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Sat May 23 18:48:36 PDT 2009


Jason House wrote:
> Andrei Alexandrescu Wrote:
> 
>> I'm writing a range to select a random sampling of a forward range.
>> The number of elements in the forward range is already known.
>> 
>> The strategy is: at any moment, I know I need to select k random
>> samples out of n. To implement popFront, I generate a random number
>> in [0, n - k]. That's the probability that the next item in the
>> input will be part of the selection. If the random number is 0,
>> then the element got selected. Otherwise, I must ignore that guy so
>> I popFront the input range, decrease n by one, and repeat.
>> 
>> This seems reasonable but somehow the selection comes skewed: the 
>> elements towards the beginning of the range are less represented
>> than the ones towards the end. Where am I wrong?
>> 
>> 
>> Andrei
> 
> You have your probability wrong. The selection probability is k/n.
> Pick a number in [0,n) and pick the element if the result is in
> [0,k). If you do select an item, decrement k. Always decrement n when
> popping, regardless of if the element is selected.

Great, thanks, that works perfect. If I want to choose _toSelect stuff 
out of _available items, this works to position to the next element:

         for (;;)
         {
             auto r = uniform(0, _available);
             if (r < _toSelect)
             {
                 // chosen!
                 return;
             }
             // not chosen, retry
             assert(!_input.empty);
             _input.popFront;
             --_available;
             assert(_available > 0);
         }

As an aside, it looks like the right-open range for random numbers is a 
good default.


Andrei



More information about the Digitalmars-d mailing list