New Lagged Fib. PRNG gen and random2.d
Joseph Rushton Wakeling
joseph.wakeling at webdrake.net
Thu Aug 29 03:41:59 PDT 2013
On 27/08/13 08:16, monarch_dodra wrote:
> What bothers me about these is that they have a fixed seed value,
> which (IMO) is even worst than default seeding to
> "unpredicatableSeed"
>
> C's "rand" also "defaults seed". I've "tutored" C++ on codeguru
> for years, and years in and years out, I'd get asked "why does my
> random program keep creating the same output". This behavior
> surprises people.
>
> A PRNG should either error-out on no-seed, or be run-time
> unpredictably seeded. The middle ground just gives you the worst
> of both...
For what it's worth, I think that your experience is a consequence of the
particular case of C. The "easy" way to generate a random number in C is via
some variation on rand() / RAND_MAX , so that's what people do, and so they get
hit by the default seed.
On the other hand with D the default random number generator is rndGen, which is
unpredictably seeded, and which is used if no other RNG is specified -- so
_default_ random behaviour in D mimics that in interpreted languages and is
different per program run. (Actually, I sometimes find myself seeding rndGen in
order to get _the same_ results, which of course sometimes you do want:-)
Personally I think there may be some value in having an accepted default
configuration for RNGs, so long as it's correctly signposted what will happen,
and so long as the "easy" thing to do is not going to fall into the trap you
described.
However, I think initialization is an important issue not just for RNGs but for
the diversity of other entities that use them. This also impacts on the
class/struct discussion.
For example, with RNGs per se it's pretty trivial (with a class-based approach)
that we have a constructor requiring a seed, and in addition (possibly) a
default constructor that seeds with some default condition (let's leave aside
preferences on this for now:-).
The latter default-seed approach cannot be implemented with structs -- you can't
have a no-parameter constructor -- so struct-based RNGs have to work round this
in one of two ways: either by having default settings for all the internal
values of the RNG state data (which e.g. Xorshift can do because it's a small
total number of parameters), or by having conditionals which get triggered when
front(), popFront() etc. are called, as in the Mersenne Twister implementation,
which calls seed() if the value of the internal parameter mti is equal to
size_t.max.
The latter approach is very annoying because it means that e.g. front() cannot
be const, which we'd like it to be.
So, all of that makes final classes a nice approach, albeit we might compromise
for other reasons. But it's not so simple for other random functions. Now
consider random sampling, and the following 3 versions of code:
{ // 1
auto gen = Random(unpredictableSeed);
auto sample = randomSample(iota(100), 10, gen);
writeln(sample);
writeln(uniform(0.0, 1.0, gen));
}
{ // 2
auto gen = Random(unpredictableSeed);
auto sample = randomSample(iota(100), 10, gen);
writeln(uniform(0.0, 1.0, gen));
writeln(sample);
}
{ // 3
auto gen = Random(unpredictableSeed);
writeln(uniform(0.0, 1.0, gen));
auto sample = randomSample(iota(100), 10, gen);
writeln(sample);
}
What would you _expect_ to happen to the values of the random number and the
random sample in each case?
The most intuitive thing would be that the values of the sample are determined
at the point it's created, so they would depend on where the line sample = ...
occurs and on nothing else. However, we know that random samples are lazily
evaluated.
So, perhaps the second logical alternative is that its values should be
determined _when it is read_, i.e. at the point where we writeln(sample).
Now consider -- when is the _first_ "front" value of the sample set? If it's
set at construction time (which would be normal for a range), the remaining
values in the sample will depend on what calls to the RNG are put in in-between
construction and reading. So, in this case, the samples from programs 1 and 2
will have the same _first_ value, but different subsequent values.
That, to me, is both unintuitive and undesirable.
Now it gets more complicated. In the case of RandomSample, we have a bunch of
different public functions we can call. When front() is called, we need to
check if the sample has been initialized before we return a value. Similarly,
popFront() should behave differently depending on whether the first sample value
has been determined yet. Then there is a method index() that returns the
numerical index of the sampled value (so, if you sample from 0, 1, 2, .... you
should have sample.front == sample.index always), and this too clearly depends
on the sample being initialized before it can return.
None of this can be solved with a struct vs. class difference, assuming we agree
that it's inherently undesirable to initialize at construction time.
This problem is potentially a rich source of bugs, as we saw with RandomCover in
our recent pull request discussion. Relying on manual solutions seems
undesirable -- you will get people (as I did) fixing front and popFront(), but
forgetting about a function like RandomSample.index(). (Fix in the works:-)
So, I'm wondering if there is a potential for a syntax sugar like invariant(),
but which will not get kicked out when compiled with -release and which will
enable essential checking of the internal state of the system whenever a public
method is called.
I'd suggest there actually be two, one to check entry condition, one exit.
More information about the Digitalmars-d
mailing list