RandomSample with specified random number generator

Sun Jun 17 09:51:55 PDT 2012

On 17/06/12 17:08, Artur Skawina wrote:
> The bug description and cause makes sense, thanks for the explanation.
>
> But the problem is that this kind of bug inside a module which is supposed
> to generate pseudo-random data makes it very hard to trust _any_ result
> given back by the code...

Sure.  When I was working on this I did spend a fair amount of time scratching 
my head over how you could create _really_ effective unittests for random-number 
functionality, without stretching out the time required too greatly.  I don't 
think sufficient tests are in place, though probably really rigorous tests of 
pseudo-random number generation would take longer than unittests are supposed to.

> So let's fix the already discovered bug:

I'm feeling a bit braindead today so I may have misunderstood your code, but I'm 
not sure your fix actually does fix the problem identified.  You could check out 
Jerro's pull request for an alternative:
https://github.com/D-Programming-Language/phobos/pull/542

> Now the result is:
>
>     [0, 7568, 7476, 0, 7494, 7500, 7461, 7504, 7527, 7470]
>
> ie still not quite what you'd expect...

If you've got time, you might like to pull from my master branch:
https://github.com/WebDrake/phobos

... and check if the same bug arises.  I made exactly this kind of test.

> This is *not* a RNG, it's a PRNG - the results must always be completely
> repeatable, just like you say in your first message. Of course having
> a mode that improves the randomness is ok and should probably even be the
> default. But if a PRNG is seeded with a known value then it must behave
> completely predictable.

I think you've slightly misunderstood what I meant.  Let's say we create a 
random sample range:

     auto sample = randomSample(/* whatever input */);

... then there are two perfectly logical and acceptable ways to handle its lazy 
evaluation.

The first is that each time you evaluate it produces the exact same result, i.e.

     writeln(sample);
     writeln(sample);
     writeln(sample);

... will produce identical output 3 times.  The alternative is that each time a 
new random sample is generated, i.e. each time we

     writeln(sample);

... we get a different sample.  This is still predictable, because the samples 
will derive from the same sequence of pseudo-random numbers, each new sample 
picking up the pseudo-random sequence where the last one left.  Assuming the 
sequence's approximation of randomness to be good enough, you'll get properly 
independent samples each time.

To me either of these possibilities is acceptable -- they're both logical and 
predictable -- but the behaviour should be the same whether or not randomSample 
is called with a specific RNG.