Ranges and random numbers -- initializing .front and related values

Wed Jun 19 03:12:17 PDT 2013

On 06/18/2013 07:21 PM, Joseph Rushton Wakeling wrote:
> I can accept that as a design principle, that iterating over a range twice is in
> general considered as undefined behaviour.  It's certainly something that I've
> implicitly accepted in my own code, as I don't think I use anything akin to the
> above in "real" applications.
> 
> Now that said, I think there's also a value in the definition of a random range
> as being one where iterating over it multiple times generates statistically
> independent outcomes, and I'm curious whether one can reliably achieve that
> design goal.

I'm a little concerned here because when the idea of an "if (_first)" lazy
evaluation was put in RandomSample.front, nobody dissented:
https://github.com/D-Programming-Language/phobos/pull/553#commits-pushed-205c3bf

I might not have made that particular change if I'd been advised that random
samples should be treated as consume-once entities.

That said, I think there's a benefit to doing it like this.  If we start with
the assumption that we're generating a lazily-evaluated random sequence,
consider the following, using the simpleRandomRange I described in the previous
discussion:

    auto gen1 = new MtClass19937(1001);
    real x1 = uniform(0.0L, 1.0L, gen1);
    auto r1 = simpleRandomRange(0.0L, 1.0L, gen1);
    writeln(r1.take(5));

    auto gen2 = new MtClass19937(1001);
    auto r2 = simpleRandomRange(0.0L, 1.0L, gen2);
    real x2 = uniform(0.0L, 1.0L, gen2);
    writeln(r2.take(5));

These will produce sequences that differ only in the _first_ entry:

    [0.379529, 0.265064, 0.551418, 0.19606, 0.227684]
    [0.306232, 0.265064, 0.551418, 0.19606, 0.227684]

... because the initial value gets set in the constructor, and so it depends
whether the call to uniform() happens before or after.  It's maybe a small
thing, but it feels more natural to me that all the randomness should happen at
the point where the range is consumed, rather than at its point of construction.

On the other hand, I don't like the idea of front losing its const pure nothrow
properties. :-(

A while ago, in response to a remark of Andrei's about the possibility of a
.finish() method for output ranges, I made an off the cuff suggestion that
random ranges might have a .start() method that would be called at the beginning
of the consumption of the range:
http://forum.dlang.org/post/mailman.1227.1344816494.31962.digitalmars-d@puremagic.com

Could such a thing be feasible?