Ranges and random numbers -- again

Joseph Rushton Wakeling joseph.wakeling at webdrake.net
Mon Jun 17 15:01:29 PDT 2013


On 06/17/2013 09:22 PM, Andrei Alexandrescu wrote:
> On 6/17/13 3:32 PM, Joseph Rushton Wakeling wrote:
>>      **************************************************************************
>>      * Reading multiple times from the start of the same random range, should *
>>      * produce different (and statistically independent) results each time.   *
>>      **************************************************************************
> 
> I don't think that's a good rule.
> 
> A range that does that should be an input range, not a forward range. Calling
> .save on a random range and tracing the steps again should produce the same values.

Let me clarify.

I'm _not_ talking about pseudo-random number generating ranges such as
MersenneTwisterEngine, which can indeed be forward ranges and in fact are
deterministic.  What I mean by a "random range" is one where popFront() calls a
source of randomness (real or pseudo) in order to update the value of front().

So, I don't consider MersenneTwister to be a random range -- but RandomSample
is, because it calls uniform() within popFront().

When saying "reading from the start", I also didn't mean to imply that
successive calls to .front should produce different values.  By "reading from
the start" I meant something like this:

    SomeRandomRange r;

    foreach(x; r)
        writeln(r);
    writeln();

    foreach(x; r)
        writeln(r);

... should produce two statistically independent sequences.

RandomSample is interesting to consider in light of your remarks -- its .save
can't in general guarantee the same values moving forward, because it can't
guarantee that the source of randomness can reproduce the same values:

        @property typeof(this) save()
        {
            auto ret = this;
            ret._input = _input.save;
            return ret;
        }

If (and only if) the RandomSample instance contains an internal generator, then
the auto ret = this; line will copy it (by value) and so the saved copy will
reproduce the sequence that would have been produced by the original.

However, if the RandomSample instance was initiated with Random == void in the
template parameters, then it will use rndGen to generate random numbers, and so
the forward sequence will be different each time.  So in this sense the "save"
is meaningless.

Now, we could and maybe should deal with that by forbidding save() in the case
where Random == void, but on the other hand, the idea of a random number
generator copied by value creates other problems to do with statistical safety.
 If I do this:

    auto sample = randomSample(iota(100), 5, rndGen);

... which is an "obvious" thing to do, then the results of that will be
correlated with subsequent things I do involving rndGen -- all because the
generator gets copied by value.

On the other hand, if we assume that the generator is a reference type, then it
will inevitably produce different samples each time it's iterated over.


More information about the Digitalmars-d mailing list