Random sampling in Phobos

Tue Apr 17 22:45:08 PDT 2012

On Wednesday, 18 April 2012 at 01:17:18 UTC, Joseph Rushton 
Wakeling wrote:
> On 17/04/12 18:25, Joseph Rushton Wakeling wrote:
>> On 17/04/12 17:31, Andrei Alexandrescu wrote:
>>> Actually that's not correct. RandomSample works fine with an 
>>> input range and
>>> does not keep it in memory.
>>
>> Ahh, OK. I should have anticipated this as the output also 
>> works as a range.
>
> A query on this point, as it looks like this can have some 
> unfortunate side-effects.
>
> If I write e.g.
>
>     auto s = randomSample(iota(0,100), 5);
>
>     foreach(uint i; s)
>         writeln(i);
>
>     writeln();
>
>     foreach(uint i; s)
>         writeln(i);
>
> ... then it's noticeable that the _first_ element of the two 
> sample lists is identical, while the others change.  If I put 
> in place a specified source of randomness, e.g.
>
>     auto s = randomSample(iota(0,100), 5, rndGen);
>
> ... then the numbers stay the same for both lists.
>
> I'm presuming that the first element remains identical because 
> it's defined in the constructor rather than elsewhere, but it's 
> not clear why passing a defined RNG produces identical output 
> on both occasions.
>
> The effect can be worse with Vitter's Algorithm D because 
> often, having defined the first element, the second may be 
> derived deterministically -- meaning the first 2 elements of 
> the sample are identical.
>
> I'm sure that the above use of the RandomSample struct is not 
> the advised use, but it's still disconcerting to see this.

I've run into this trap more than once. :)  You have to pass the 
random number generator by ref, otherwise you are just generating 
two identical sequences of random numbers.  Just change the 
randomSample signature to:

     auto randomSampleVitter(R, Random)(R r, size_t n, ref Random 
gen)

-Lars