Ranges and random numbers -- initializing .front and related values

Tue Jun 18 11:21:03 PDT 2013

On 06/18/2013 05:29 PM, H. S. Teoh wrote:
> Actually, I question the test code:
> 
>      auto gen = new MtClass19937(unpredictableSeed);
>      auto r = remainder(5, gen);
>      writeln(r);
>      writeln(r);
> 
> I think there may be some misunderstanding of the semantics of r here.
> This may have been caused by the fact that arrays are "anomalous" ranges
> (as Jonathan put it), in that you can iterate over them multiple times
> and get the same results. However, generally, wrapper ranges are not
> designed to be iterated more than once; once the range has been
> consumed, no assumption can be made about it afterwards. That is to say,
> unless the range-wrapping function returns a forward range, they can
> only be iterated over once.

I can accept that as a design principle, that iterating over a range twice is in
general considered as undefined behaviour.  It's certainly something that I've
implicitly accepted in my own code, as I don't think I use anything akin to the
above in "real" applications.

Now that said, I think there's also a value in the definition of a random range
as being one where iterating over it multiple times generates statistically
independent outcomes, and I'm curious whether one can reliably achieve that
design goal.

> The second writeln above is therefore treading on dangerous ground; the
> correct approach is to decide whether you want to iterate over the same
> range twice:
> 
>      auto gen = new MtClass19937(unpredictableSeed);
>      auto r = remainder(5, gen);
>      writeln(r.save);		// assuming r is a forward range
>      writeln(r);
> 
> or, if you want two *different* sequences to be extracted from gen:
> 
>      auto gen = new MtClass19937(unpredictableSeed);
>      writeln(remainder(5, gen));
>      writeln(remainder(5, gen));	// call remainder() twice to
> 					// extract 5 elements from gen
> 					// twice
> 
> IOW, the range returned by remainder() is not to be treated as some kind
> of "repeatable data source" in the sense that iterating it multiple
> times will repeat the behaviour of taking 5 elements from gen and doing
> stuff to them.  Instead, it should be treated as a single-use data
> source, that gets consumed by the first writeln(), so the second
> writeln() is, strictly speaking, invalid.

Your second example corresponds to what I would do in practice if I were, for
example, generating many random samples.

However, suppose that you have an entity that is heavy to construct.  Then there
may be a value in being able to do something like,

    auto r = someRandomRange(.....);
    foreach(i; iota(1_000_000))
    {
        processResultsOf(r);
    }

... as opposed to,

    foreach(i; iota(1_000_000))
    {
        auto r = someRandomRange(.....);
        processResultsOf(r);
    }

Despite the difficulties I've identified in my previous email, I still think it
ought to be possible to derive a random range design that can do that reliably.

> But r being a value type is really a question of implementation; from an
> abstract POV, the user shouldn't need to know whether it's a value type
> or reference type, and thus should *not* assume one way or another. So
> passing r to writeln twice is, strictly speaking, incorrect code. The
> correct way to achieve what you intend in this case, is to call
> remainder() twice, as in my second code snippet above.

Your point that the repeatable behaviour is an artifact of the range being a
value type is a very good one.  Could a reasonable conclusion be that ranges
should be designed to behave _like_ reference types even if they're technically
implemented as structs, precisely in order to prevent the user from doing unsafe
things like multiple iteration passes?

On the other hand, is there anything particularly wrong with designing a range
to be able to handle multiple iteration passes?