RandomSample with specified random number generator
Joseph Rushton Wakeling
joseph.wakeling at webdrake.net
Sun Jun 17 09:51:55 PDT 2012
On 17/06/12 17:08, Artur Skawina wrote:
> The bug description and cause makes sense, thanks for the explanation.
>
> But the problem is that this kind of bug inside a module which is supposed
> to generate pseudo-random data makes it very hard to trust _any_ result
> given back by the code...
Sure. When I was working on this I did spend a fair amount of time scratching
my head over how you could create _really_ effective unittests for random-number
functionality, without stretching out the time required too greatly. I don't
think sufficient tests are in place, though probably really rigorous tests of
pseudo-random number generation would take longer than unittests are supposed to.
> So let's fix the already discovered bug:
I'm feeling a bit braindead today so I may have misunderstood your code, but I'm
not sure your fix actually does fix the problem identified. You could check out
Jerro's pull request for an alternative:
https://github.com/D-Programming-Language/phobos/pull/542
> Now the result is:
>
> [0, 7568, 7476, 0, 7494, 7500, 7461, 7504, 7527, 7470]
>
> ie still not quite what you'd expect...
If you've got time, you might like to pull from my master branch:
https://github.com/WebDrake/phobos
... and check if the same bug arises. I made exactly this kind of test.
> This is *not* a RNG, it's a PRNG - the results must always be completely
> repeatable, just like you say in your first message. Of course having
> a mode that improves the randomness is ok and should probably even be the
> default. But if a PRNG is seeded with a known value then it must behave
> completely predictable.
I think you've slightly misunderstood what I meant. Let's say we create a
random sample range:
auto sample = randomSample(/* whatever input */);
... then there are two perfectly logical and acceptable ways to handle its lazy
evaluation.
The first is that each time you evaluate it produces the exact same result, i.e.
writeln(sample);
writeln(sample);
writeln(sample);
... will produce identical output 3 times. The alternative is that each time a
new random sample is generated, i.e. each time we
writeln(sample);
... we get a different sample. This is still predictable, because the samples
will derive from the same sequence of pseudo-random numbers, each new sample
picking up the pseudo-random sequence where the last one left. Assuming the
sequence's approximation of randomness to be good enough, you'll get properly
independent samples each time.
To me either of these possibilities is acceptable -- they're both logical and
predictable -- but the behaviour should be the same whether or not randomSample
is called with a specific RNG.
More information about the Digitalmars-d
mailing list