RandomSample with specified random number generator

Artur Skawina art.08.09 at gmail.com
Sun Jun 17 09:08:57 PDT 2012


On 06/17/12 16:37, Joseph Rushton Wakeling wrote:
> On 15/06/12 06:58, Artur Skawina wrote:
>> Considering the output of this program:
>>
>>     import std.stdio;
>>     import std.random;
>>
>>     void main() {
>>        foreach (i; 0..20)
>>           writeln(randomSample([0,1,2,4,5,6,7,8,9], 3, Random(unpredictableSeed)));
>>     }
>>
>> I'd say the use of std.random should be disallowed on grounds of safety...
> 
> Yea, it's for a mix of reasons which Jerro and I prepared fixes for.
> 
> In the current RandomSample implementation the first value of the sample is set in the constructor so that when you call .front() there will be a value waiting.  That means that whatever the subsequent lazy evaluation of the sample, the first point will _always_ be the same.
> 
> You wouldn't expect that to be a problem here because you're generating 20 different samples.  But -- nastily -- in the current setup, if you specify a RNG, its value is set only _after_ the RandomSample object has been constructed, meaning this RNG does not affect the generation of the first value.  Hence the output you see.

The bug description and cause makes sense, thanks for the explanation.

But the problem is that this kind of bug inside a module which is supposed
to generate pseudo-random data makes it very hard to trust _any_ result
given back by the code...
Other than the constant first element, does the output of the above program
look correct? It did not for me, so i tried this:

   import std.stdio;
   import std.random;

   void main() {
      long cnts[10];
   
      foreach (i; 0..20000) {
         auto r = randomSample([0,1,2,4,5,6,7,8,9], 3, Random(unpredictableSeed));
         foreach (e; r)
            cnts[e]++;
      }
      writeln(cnts);
   }

which results in:

   [0, 20000, 5749, 0, 5781, 5702, 5692, 5618, 5674, 5784]

So let's fix the already discovered bug:

@@ -1330,7 +1330,8 @@ struct RandomSample(R, Random = void)
     // choice but to store a copy.
     static if(!is(Random == void))
     {
-        Random gen;
+        Random _gen;
+        @property gen(Random g) { _gen = g; prime(); return _gen; }
     }
 
 /**
@@ -1411,7 +1412,7 @@ Returns the index of the visited record.
             }
             else
             {
-                auto r = uniform(0, _available, gen);
+                auto r = uniform(0, _available, _gen);
             }
 
             if (r < _toSelect)

and rerun the program. Now the result is:

   [0, 7568, 7476, 0, 7494, 7500, 7461, 7504, 7527, 7470]

ie still not quite what you'd expect...


Unfortunately this makes it impossible to use std.random in practice,
because you can't tell how many more such problems exist, if my first
two attempts to use the module already uncovered two serious bugs...

> Jerro's fix corrected the copying of RNG.  Mine instead delayed the calculation of the first value so that it was always part of the lazy evaluation.  Which is the better option depends on how you prefer to resolve the inconsistency identified in this discussion thread, i.e. whether the lazy evaluation should always come out to the same set of values, or always different ones.

This is *not* a RNG, it's a PRNG - the results must always be completely
repeatable, just like you say in your first message. Of course having
a mode that improves the randomness is ok and should probably even be the
default. But if a PRNG is seeded with a known value then it must behave
completely predictable.

artur


More information about the Digitalmars-d mailing list