Random sampling in Phobos

Joseph Rushton Wakeling joseph.wakeling at webdrake.net
Tue Apr 17 15:36:55 PDT 2012


On 17/04/12 18:25, Joseph Rushton Wakeling wrote:
> On 17/04/12 17:31, Andrei Alexandrescu wrote:
>> Actually that's not correct. RandomSample works fine with an input range and
>> does not keep it in memory.
>
> Ahh, OK. I should have anticipated this as the output also works as a range.

A query on this point, as it looks like this can have some unfortunate side-effects.

If I write e.g.

     auto s = randomSample(iota(0,100), 5);

     foreach(uint i; s)
         writeln(i);

     writeln();

     foreach(uint i; s)
         writeln(i);

... then it's noticeable that the _first_ element of the two sample lists is 
identical, while the others change.  If I put in place a specified source of 
randomness, e.g.

     auto s = randomSample(iota(0,100), 5, rndGen);

... then the numbers stay the same for both lists.

I'm presuming that the first element remains identical because it's defined in 
the constructor rather than elsewhere, but it's not clear why passing a defined 
RNG produces identical output on both occasions.

The effect can be worse with Vitter's Algorithm D because often, having defined 
the first element, the second may be derived deterministically -- meaning the 
first 2 elements of the sample are identical.

I'm sure that the above use of the RandomSample struct is not the advised use, 
but it's still disconcerting to see this.


More information about the Digitalmars-d mailing list