isUniformRNG

Sat May 10 20:58:49 PDT 2014

On 5/10/2014 12:34 PM, Joseph Rushton Wakeling via Digitalmars-d wrote:
 > On 09/05/14 02:42, Nick Sabalausky via Digitalmars-d wrote:
 >> There can technically be multiple "instances", but yea, they're all
 >> effectively
 >> tied together. However, I'm leaning towards the belief that's "correct
 >> behavior"
 >> for a RNG. It's *definitely* correct for a crypto RNG - you certainly
 >> wouldn't
 >> want two crypto RNGs ever generating the same sequence, not even
 >> deliberately.
 >> That would defeat the whole point of a crypto RNG.
 >
 > I'm not sure I follow your point about crypto RNGs here.  Assuming it's
 > deterministic at all, if someone else knows the seed state (which would
 > be required to get two instances generating the same sequences), aren't
 > you buggered anyway?  And isn't the point of a deterministic crypto RNG
 > that its seed state is kept secret, or that it is seeded with
 > non-deterministic values upon instantiation?
 >

The seed doesn't need to be compromised for synchronized RNGs to fubar 
security. It's bad enough just to accidentally generate the same byte 
sequence for multiple things:

struct CryptoRng { ulong internalState; ...etc... }
CryptoRng myRand;

// Oops! Duplicates internal state!
ubyte[] getRandomJunk(CryptoRng rand) {
	return rand.take(64).array();
}

// Oops! These are all identical! Passwords may as well be unsalted!
auto saltForStevensPassword =  getRandomJunk(myRand);
auto saltForSallysPassword =  getRandomJunk(myRand);
auto saltForGregsPassword =  getRandomJunk(myRand);

// Oops! Also identical!
auto oneTimeURL1 =  "https://verify-email/a@a.a/"~getRandomJunk(myRand);
auto oneTimeURL2 =  "https://verify-email/b@b.b/"~getRandomJunk(myRand);
auto oneTimeURL3 =  "https://verify-email/c@c.c/"~getRandomJunk(myRand);

May as well have just defined getRandomJunk as a constant ;)

Maybe not *too* terrible for randomized alien invader attack patterns 
(unless it makes the game boring and doesn't sell), but a fairly major 
failure for security purposes.

There's another issue, too:

First a little background: With Hash_DRBG (and I would assume other 
crypto RNGs as well), the RNG is periodically reseeded, both 
automatically and on-demand. This reseeding is *not* like normal 
non-crypto reseeding: With non-crypto reseeding, you're fully resetting 
the internal state to a certain specific value and thus resetting the 
existing entropy, replacing it with the fresh new entropy. But with 
Hash_DRBG, reseeding only *adds* additional fresh entropy - nothing gets 
reset. Any existing entropy still remains. So reseeding two different 
Hash_DRBGs with the *same* reseed data *still* results in two completely 
different internal states. This is deliberately defined by the NIST spec.

There are reasons for that, but one noteworthy side-consequence is this: 
If all your Hash_DRBG's share the same internal state, then anytime one 
is reseeded with fresh entropy, they *all* benefit from the fresh 
entropy, for free.

And as a (maybe smaller) additional benefit, if different parts of your 
program are using an RNG independently of each other, but the RNGs 
internally share the same state, then *each* component contributes 
additional (semi-)non-determinism to the pool. That part's true of other 
RNGs too, but the effect is compounded further by Hash_DRBG which 
updates the internal state differently depending on how much data is 
requested at each, umm...at each request.

 >> As for ordinary non-crypto RNGs, I honestly can't imaging any 
purpose for
 >> reliably generating the same values other than "playing back" a previous
 >> sequence. But if that's what you want, then IMO it's better to 
record the
 >> sequence of emitted values, or even record/replay the higher-level
 >> decisions
 >> which the randomness was used to influence.
 >
 > Even that will not save you in Minecraft, where the results of different
 > seeds will produce different results in updated versions of the game
 > where there are more random possibilities available (more monsters, more
 > flora and fauna...). ;-)
 >

Yea, exactly, that's why it's best to record high-level *results* (like 
"spawn a cypress tree at position (12,73,4)" or "alien fighter ID 742 
changed velocity to (...) at time (...)") instead of relying on 
recreating it all from a given seed and deterministic RNG algorithm.

However...the mentions of Minecraft did raise one point that I'll 
concede: If you have a multiplayer environment (or more generally, a 
large chunk of shared multi-computer data) that's randomly-generated, 
then a very careful usage of a shared seed and RNG algorithm could 
drastically reduce the network bandwidth requirements.

I have no idea whether minecraft actually works that way, or even it's 
even random-generated (I've only played for a few minutes - I like the 
style, but I don't have the attention span for sandbox stuff ;) ), but 
it is enough to convince me that there might be legitimate uses after 
all for individual RNG instances that emit the same sequence. (But not 
crypto-RNGs though. Those exist purely for one very specific purpose - 
to maximize unpredictability when there's limited "true" randomness.)

Granted, in that particular case of "shared random-generated data", I'm 
not sure I'd feel comfortable relying on an RNG from a third-party lib, 
or even a standard library. One tiny little thing changes and causes one 
tiny little variance in the outputs...and everything gets shot to hell. 
And it won't even be detected unless you thought ahead enough to verify 
everyone's results with checksums (or some such). So if it were me, I'd 
either write or borrow an RNG, but then keep it in the project's own 
codebase where I can more easily limit any chance of changed behavior.

 >
 > There's a difference between _relying_ on the deterministic sequence
 > from a pseudo-RNG and the principle of design that says if I declare two
 > separate RNG instances,
 >
 >      SomeRNG rng1;
 >      SomeRNG rng2;
 >
 > then the calls to rng1 should not have any effect on the results of the
 > calls to rng2.  Or alternatively, if I do,
 >
 >      SomeRNG rng1;
 >      SomeRNG rng2;
 >
 >      rng1.seed(n);
 >      rng2.seed(n);
 >
 >      foreach (a, b; lockstep(rng1, rng2).take(10))
 >      {
 >          writefln("%s\t%s", a, b);
 >      }
 >
 > then the values I get should be the same, because rng1 and rng2 are
 > independent instances that have been identically seeded.
 >

I do agree strongly that there's a major difference between "relying on 
deterministic sequence from an RNG" and "principle of design".

However, I also maintain that the "principle of design" case demands you 
should *NOT* get the same results from RNGs unless you *specifically 
intend* to be "relying on deterministic sequence from an RNG". After 
all, they *are* supposed to be "random number generators". That's their 
primary function.

I do believe strongly in the same principles of encapsulation that you 
do. But the thing is, the usual rules of proper encapsulated design are 
deliberately and fundamentally predicated on the assumption that you 
*want* reliably predictable deterministic behavior. The vast majority of 
the time, that's a correct assumption - but the whole *point* of an RNG 
is to *not* be deterministic. Or at least to get as close to 
non-determinism as reasonably possible within a fundamentally 
deterministic system.

Since the basic goals of RNGs are in fundamental disagreement with the 
underlying assumptions of usual "well encapsulated" design, that calls 
into question the applicability of usual encapsulation rules to RNGs.

At the risk of being presumptuous, I think you do agree on some level. 
Isn't that the whole reason you've proposed that RNGs change from 
structs to classes? Because you don't want identical sequences from RNGs 
unless *specifically* intended, right? The only parts of that I've 
intended to question are:

A. Does that really necessitate moving to classes?
B. Are there any legitimate use-cases for "specifically relying on 
deterministic sequence from an RNG"?

For B, the above example of "reducing bandwidth/storage requirements for 
a large set of shared random-generated data" is making me question my 
former stance of "No, there's no legitimate use-case". If B falls (and I 
suppose it maybe it does), then A does too, and classes would indeed be 
necessary.

But that's all for non-crypto RNGs. For crypto-RNGs, determinism is 
*never* anything more than a concession forced by the deterministic 
nature of the machines themselves. FWIW, this does mean  crypto-RNGs can 
go either struct or class, since the internal states really should be 
tied together anyway (in order to prevent their *outputs* from being 
tied together).

 > We could get into broader discussions, but I don't think it needs to be
 > more complicated than: if I create an instance of a variable of type T,
 > then I should expect that it will be independent of any other instance
 > of a variable of type T, unless I have deliberately created that link.
 >

Yes, I can agree with that much, but the difficulty is that synchronized 
"random" number generation *is* effectively an implicit link, even if 
the internal implementation doesn't explicitly link them. Having the 
same internal state *is* the link, whether it was intentional or not. To 
make them logically "unlink", some care is required and *may* involve a 
concrete link on the implementation side. (I say "may involve" instead 
of "will involve" because, naturally, just using classes as you've 
proposed also does the job of avoiding unintentional linking.)

Again, the nature of "random" or "psuedo-random" is at odds with normal 
rules and kinda flips things around. If I copy a normal struct, then 
yes, I expect most operations on both to be deterministic and return the 
same results - but that's because most structs *model* deterministic 
things. If I do:

auto rngB = rngA;

Then they're both still psuedo-random number generators, so I think for 
such a simple operation as basic assignment (or passing to a func, etc) 
it's questionable to start getting the same values from abstraction 
intended to model (more or less) randomness. I'd expect to have to go 
further out of my way:

auto rngB = rngA.clone;

But, you're right, that does ultimately suggest that structs may be a 
questionable choice for an RNG anyway.

 >> I'm also unconvinced that it breaks encapsulation anyway. If anything,
 >> I think
 >> it's better encapsulated since separate instances *won't* behave like
 >> they're
 >> under "quantum entanglement". It forces separate instances to stay
 >> separate.
 >
 > Cf. my earlier example with two individual RNG instances initialized
 > with the same seed.  This ought to produce two identical sequences; with
 > your proposed static internals, it won't, because calls to rng1 will
 > affect the results of subsequent calls to rng2, and vice versa.

This is a "concrete vs logical" issue.

First of all, consider the basic idea of encapsulation: It's a black 
box. You use it, you get the result described on its label, no peeking 
inside. Note that the label on an RNG basically says "Retrieve a 
seemingly random value."

Now your question, "Are two RNGs with a shared internal state 
encapsulated?" That's where the "concrete vs logical" comes in:

Concrete:

In a concrete sense, yes each one will cause the other to emit a 
different *specific* value, so *concretely* maybe they're not 
encapsulated. But remember, these are psuedo-*random* number generators 
- minimal predictability is the whole point. Getting the same sequence 
of values from two different things is correlation - ie, predictability. 
Predictability is the antithesis of random. Violates our primary 
purpose. Encapsulation or not, the label on the box lied.

Logical:

The logical meaning of "return a (psudo-)random number" is "gimme a 
seemingly random number, avoid any correlation with anything 
meaningful". In a logical sense, both RNGs (with shared internal state) 
are *still* emitting psuedo-random numbers, which is *exactly* what 
their encapsulation unit defines.

Even better still, the internal (*not* external) interplay between the 
two black boxes (which "encapsulation" dictates we shouldn't be worrying 
about anyway) is ensuring we get *minimal* correlation between the two 
RNG's outputs, ie. less predictability, ie. once again *exactly* what we 
expect from a black box labeled "Push my button and I'll give you a 
seemingly random value with minimal correlation to anything meaningful".

It didn't give you the "random" number you wanted? Well then you weren't 
really looking for random anyway. You're peeking inside the box and 
saying "I want *that* random value, because that's the one the other 
randomizer gave me". Not very encapsulated.

*That* is the primary purpose of an RNG, deterministic or not.

But that brings us to the secondary purpose: Maybe we really do 
occasionally want to be able to recreate the same sequence. Well ok, 
fine. But that's a special case. Expecting it as a side-effect of using 
an encapsulation unit labeled "psuedo-random number generator" isn't 
respecting encapsulation. Encapsulation is using the button labeled "Get 
an RNG that emits the same sequence as some other RNG."

 > This is
 > pretty much the definition of breaking encapsulation, no?
 >

Well, again, the normal rules of encapsulation are predicated on the 
assumption that you want reliable and predictable, not random and 
unpredictable. With a RNG you DO want (seemingly) random and 
unpredictable. Thus, encapsulation is only applicable until it starts 
interfering with an RNG's raison d'etre. Getting predictably similar 
values from two RNGs when you haven't *specifically* requested so,  is 
certainly in opposition to an RNG's very purpose.

 > The point of a pseudo-RNG is that if you analyse the sequence of
 > variates produced, you can't distinguish it from a genuine random
 > sequence.  But it's also intended that different instances seeded with
 > different values ought to produce statistically independent sequences
 > with very high probability.
 >
 > Your proposal would make it impossible to _have_ multiple effective
 > instances seeded differently (unless in different threads).
 >

That's why I brought up the question of whether there was a 
significantly valid use-case for doing do. If there really is, then yea, 
my idea blows ;)

For non-crypto-RNGs, I guess I'm kinda on the fence about whether 
there's a worthwhile use-case (but I guess I'm outnumbered, so I don't 
mind as long as your class idea is adopted in order to prevent 
accidental correlations).

For crypto-RNGs, I maintain that the ability to duplicate and generate 
identical sequences is an anti-feature. (For that matter, crypto-RNGs 
are specifically *supposed* to seed, and periodically reseed, 
*themselves* with data that contains at least some true randomness. So 
hopefully, none of this should be objectionable for crypto-RNGs.) Note, 
FWIW, this does *not* prevent crypto-RNGs from being class-based, so 
that aspect isn't even an issue anyway (just to be clear).

 >>
 >> Hmm, yea, I'm more and more liking the "keep it private for now, and
 >> then see
 >> what happens" approach...unfortunately... :/
 >
 > Yes, that's probably for the best.
 >

Another option is to still permit the stream versions, but label them 
with a big red warning that they're a temporary design and may change 
when std.stream gets revamped.

I think maybe that'd be a more pragmatic way, user-friendly to go. If 
they hate breakage, well, they've been warned and nobody's forcing them 
to use it. If they have reason to use it and don't mind potential 
breaking, well, then nobody's forcing them to NOT use it.