Random sampling in Phobos
Lars T. Kyllingstad
public at kyllingen.net
Tue Apr 17 22:45:08 PDT 2012
On Wednesday, 18 April 2012 at 01:17:18 UTC, Joseph Rushton
Wakeling wrote:
> On 17/04/12 18:25, Joseph Rushton Wakeling wrote:
>> On 17/04/12 17:31, Andrei Alexandrescu wrote:
>>> Actually that's not correct. RandomSample works fine with an
>>> input range and
>>> does not keep it in memory.
>>
>> Ahh, OK. I should have anticipated this as the output also
>> works as a range.
>
> A query on this point, as it looks like this can have some
> unfortunate side-effects.
>
> If I write e.g.
>
> auto s = randomSample(iota(0,100), 5);
>
> foreach(uint i; s)
> writeln(i);
>
> writeln();
>
> foreach(uint i; s)
> writeln(i);
>
> ... then it's noticeable that the _first_ element of the two
> sample lists is identical, while the others change. If I put
> in place a specified source of randomness, e.g.
>
> auto s = randomSample(iota(0,100), 5, rndGen);
>
> ... then the numbers stay the same for both lists.
>
> I'm presuming that the first element remains identical because
> it's defined in the constructor rather than elsewhere, but it's
> not clear why passing a defined RNG produces identical output
> on both occasions.
>
> The effect can be worse with Vitter's Algorithm D because
> often, having defined the first element, the second may be
> derived deterministically -- meaning the first 2 elements of
> the sample are identical.
>
> I'm sure that the above use of the RandomSample struct is not
> the advised use, but it's still disconcerting to see this.
I've run into this trap more than once. :) You have to pass the
random number generator by ref, otherwise you are just generating
two identical sequences of random numbers. Just change the
randomSample signature to:
auto randomSampleVitter(R, Random)(R r, size_t n, ref Random
gen)
-Lars
More information about the Digitalmars-d
mailing list