New Lagged Fib. PRNG gen and random2.d

Thu Aug 29 03:41:59 PDT 2013

On 27/08/13 08:16, monarch_dodra wrote:
> What bothers me about these is that they have a fixed seed value,
> which (IMO) is even worst than default seeding to
> "unpredicatableSeed"
>
> C's "rand" also "defaults seed". I've "tutored" C++ on codeguru
> for years, and years in and years out, I'd get asked "why does my
> random program keep creating the same output". This behavior
> surprises people.
>
> A PRNG should either error-out on no-seed, or be run-time
> unpredictably seeded. The middle ground just gives you the worst
> of both...

For what it's worth, I think that your experience is a consequence of the 
particular case of C.  The "easy" way to generate a random number in C is via 
some variation on rand() / RAND_MAX , so that's what people do, and so they get 
hit by the default seed.

On the other hand with D the default random number generator is rndGen, which is 
unpredictably seeded, and which is used if no other RNG is specified -- so 
_default_ random behaviour in D mimics that in interpreted languages and is 
different per program run.  (Actually, I sometimes find myself seeding rndGen in 
order to get _the same_ results, which of course sometimes you do want:-)

Personally I think there may be some value in having an accepted default 
configuration for RNGs, so long as it's correctly signposted what will happen, 
and so long as the "easy" thing to do is not going to fall into the trap you 
described.

However, I think initialization is an important issue not just for RNGs but for 
the diversity of other entities that use them.  This also impacts on the 
class/struct discussion.

For example, with RNGs per se it's pretty trivial (with a class-based approach) 
that we have a constructor requiring a seed, and in addition (possibly) a 
default constructor that seeds with some default condition (let's leave aside 
preferences on this for now:-).

The latter default-seed approach cannot be implemented with structs -- you can't 
have a no-parameter constructor -- so struct-based RNGs have to work round this 
in one of two ways: either by having default settings for all the internal 
values of the RNG state data (which e.g. Xorshift can do because it's a small 
total number of parameters), or by having conditionals which get triggered when 
front(), popFront() etc. are called, as in the Mersenne Twister implementation, 
which calls seed() if the value of the internal parameter mti is equal to 
size_t.max.

The latter approach is very annoying because it means that e.g. front() cannot 
be const, which we'd like it to be.

So, all of that makes final classes a nice approach, albeit we might compromise 
for other reasons.  But it's not so simple for other random functions.  Now 
consider random sampling, and the following 3 versions of code:

     {   // 1
         auto gen = Random(unpredictableSeed);
         auto sample = randomSample(iota(100), 10, gen);
         writeln(sample);
         writeln(uniform(0.0, 1.0, gen));
     }

     {   // 2
         auto gen = Random(unpredictableSeed);
         auto sample = randomSample(iota(100), 10, gen);
         writeln(uniform(0.0, 1.0, gen));
         writeln(sample);
     }

     {   // 3
         auto gen = Random(unpredictableSeed);
         writeln(uniform(0.0, 1.0, gen));
         auto sample = randomSample(iota(100), 10, gen);
         writeln(sample);
     }

What would you _expect_ to happen to the values of the random number and the 
random sample in each case?

The most intuitive thing would be that the values of the sample are determined 
at the point it's created, so they would depend on where the line sample = ... 
occurs and on nothing else.  However, we know that random samples are lazily 
evaluated.

So, perhaps the second logical alternative is that its values should be 
determined _when it is read_, i.e. at the point where we writeln(sample).

Now consider -- when is the _first_ "front" value of the sample set?  If it's 
set at construction time (which would be normal for a range), the remaining 
values in the sample will depend on what calls to the RNG are put in in-between 
construction and reading.  So, in this case, the samples from programs 1 and 2 
will have the same _first_ value, but different subsequent values.

That, to me, is both unintuitive and undesirable.

Now it gets more complicated.  In the case of RandomSample, we have a bunch of 
different public functions we can call.  When front() is called, we need to 
check if the sample has been initialized before we return a value.  Similarly, 
popFront() should behave differently depending on whether the first sample value 
has been determined yet.  Then there is a method index() that returns the 
numerical index of the sampled value (so, if you sample from 0, 1, 2, .... you 
should have sample.front == sample.index always), and this too clearly depends 
on the sample being initialized before it can return.

None of this can be solved with a struct vs. class difference, assuming we agree 
that it's inherently undesirable to initialize at construction time.

This problem is potentially a rich source of bugs, as we saw with RandomCover in 
our recent pull request discussion.  Relying on manual solutions seems 
undesirable -- you will get people (as I did) fixing front and popFront(), but 
forgetting about a function like RandomSample.index().  (Fix in the works:-)

So, I'm wondering if there is a potential for a syntax sugar like invariant(), 
but which will not get kicked out when compiled with -release and which will 
enable essential checking of the internal state of the system whenever a public 
method is called.

I'd suggest there actually be two, one to check entry condition, one exit.