Ranges and random numbers -- again

Tue Jun 18 03:15:51 PDT 2013

On 06/18/2013 10:30 AM, Joseph Rushton Wakeling wrote:
> I don't come to that conclusion because I _want_ random ranges to be
> un-.save-able, but because I think without that design choice, there will simply
> be too many ways to unknowingly generate unwanted correlations in
> random-number-using programs.
> 
> I'll follow up on that point later today.

Just as a simple example -- and this involving purely pseudo-random number
generation, not "random ranges" as I've conceived them [*] -- consider the
following:

    auto t1 = rndGen.take(5);
    writeln(t1);

    auto t2 = rndGen.take(5);
    writeln(t2);

I'd expect that to produce two distinct sequences.  In fact it produces two
identical sequences.

I think this must be because, passed a forward range like Mt19937,
std.range.Take uses this constructor:

    this(R input) { _original = input; _current = input.save; }

... and so when rndGen.take(5) is consumed, rndGen itself is not iterated forward.

To me that's a very serious potential source of statistical errors in a program,
and it's particularly bad because it's innocuous -- absent the writeln of the
results, it would be easy to never realize what's happening here.

Now imagine the potential consequences for D's use in scientific simulation -- I
don't want to see D get raked over the coals because some researcher's published
simulation results turned out to be spurious ... :-)

I'd be very happy to see a way in which this issue can be resolved while
preserving the opportunity to .save, but I don't personally see one that doesn't
rely on a heavy amount of user virtue to avoid these statistical pitfalls.

[* Actually, rndGen.take(n) is arguably a random range according to my
definition, because rndGen.take(n).popFront() will request a (pseudo-)random
number.]