isUniformRNG

Sun May 11 05:16:54 PDT 2014

On 11/05/14 05:58, Nick Sabalausky via Digitalmars-d wrote:
> The seed doesn't need to be compromised for synchronized RNGs to fubar security.

Before we go onto the detail of discussion, thanks very much for the extensive 
explanation.  I was slightly worried that my previous email might have come 
across as dismissive of your (completely understandable) concerns.  I'm actually 
quite keen that we can find a mutually agreeable solution :-)

> It's bad enough just to accidentally generate the same byte sequence for
> multiple things:
>
> struct CryptoRng { ulong internalState; ...etc... }
> CryptoRng myRand;
>
> // Oops! Duplicates internal state!
> ubyte[] getRandomJunk(CryptoRng rand) {
>      return rand.take(64).array();
> }
>
> // Oops! These are all identical! Passwords may as well be unsalted!
> auto saltForStevensPassword =  getRandomJunk(myRand);
> auto saltForSallysPassword =  getRandomJunk(myRand);
> auto saltForGregsPassword =  getRandomJunk(myRand);
>
> // Oops! Also identical!
> auto oneTimeURL1 =  "https://verify-email/a@a.a/"~getRandomJunk(myRand);
> auto oneTimeURL2 =  "https://verify-email/b@b.b/"~getRandomJunk(myRand);
> auto oneTimeURL3 =  "https://verify-email/c@c.c/"~getRandomJunk(myRand);
>
> May as well have just defined getRandomJunk as a constant ;)

Sure, but this is a consequence of two things (i) CryptoRng is a value type and 
(ii) it gets passed by value, not by reference.

In your example, obviously one can fix the problem by having the function 
declared instead as

     ubyte[] getRandomJunk(ref CryptoRng rand) { ... }

but I suspect you'd say (and I would agree) that this is inadequate: it's 
relying on programmer virtue to ensure the correct behaviour, and sooner or 
later someone will forget that "ref".  (It will also not handle other cases, 
such as an entity that needs to internally store the RNG, as we've discussed 
many times on the list with reference to e.g. RandomSample or RandomCover.)

Obviously one _can_ solve the problem by the internal state variables of the RNG 
being static, but I'd suggest to you that RNG-as-reference-type (which doesn't 
necessarily need to mean "class") solves that particular problem without the 
constraints that static internal variables have.

> Maybe not *too* terrible for randomized alien invader attack patterns (unless it
> makes the game boring and doesn't sell), but a fairly major failure for security
> purposes.
>
> There's another issue, too:
>
> First a little background: With Hash_DRBG (and I would assume other crypto RNGs
> as well), the RNG is periodically reseeded, both automatically and on-demand.
> This reseeding is *not* like normal non-crypto reseeding: With non-crypto
> reseeding, you're fully resetting the internal state to a certain specific value
> and thus resetting the existing entropy, replacing it with the fresh new
> entropy. But with Hash_DRBG, reseeding only *adds* additional fresh entropy -
> nothing gets reset. Any existing entropy still remains. So reseeding two
> different Hash_DRBGs with the *same* reseed data *still* results in two
> completely different internal states. This is deliberately defined by the NIST
> spec.
>
> There are reasons for that, but one noteworthy side-consequence is this: If all
> your Hash_DRBG's share the same internal state, then anytime one is reseeded
> with fresh entropy, they *all* benefit from the fresh entropy, for free.
>
> And as a (maybe smaller) additional benefit, if different parts of your program
> are using an RNG independently of each other, but the RNGs internally share the
> same state, then *each* component contributes additional (semi-)non-determinism
> to the pool. That part's true of other RNGs too, but the effect is compounded
> further by Hash_DRBG which updates the internal state differently depending on
> how much data is requested at each, umm...at each request.

This is a very interesting point.  My understanding is that in many (most?) 
programs, each individual thread does typically operate using a single RNG 
instance anyway.  The question for me is whether that's a design choice of the 
developer or something that is an unavoidable outcome of the RNG design.

For example, it's readily possible to just instantiate a single RNG (crypto or 
otherwise, as required) at the start of the program and pass that around: if 
it's a reference type (as I would argue it should be) then the effects you 
desire will still take place.

Alternatively, one can also implement a thread-global RNG instance which 
functions will use.  This is already done with rndGen, which is effectively a 
singleton pattern.  There's no reason why one couldn't also define a cryptoGen 
which works in a similar way, just using an instance of a cryto RNG where rndGen 
uses a Mersenne Twister:
https://github.com/D-Programming-Language/phobos/blob/d2f2d34d52f89dc9854069b13f52523965a7107e/std/random.d#L1161-L1174
https://github.com/WebDrake/std.random2/blob/a071d8d182eb28516198bb292a0b45f86f4425fe/std/random2/generator.d#L60-L81

The benefit of doing it this way, as opposed to static internal variables, is 
that you aren't then constrained to have only one single, effectively 
thread-global instance of _every_ RNG you create.

> However...the mentions of Minecraft did raise one point that I'll concede: If
> you have a multiplayer environment (or more generally, a large chunk of shared
> multi-computer data) that's randomly-generated, then a very careful usage of a
> shared seed and RNG algorithm could drastically reduce the network bandwidth
> requirements.

You may also recall earlier games like Elite which were able to "store" whole 
universes on a single 5 1/4" floppy by virtue of auto-generating that universe 
from a repeatable random sequence.  It was amazing what could be done on the old 
BBC Micro :-)

> I do agree strongly that there's a major difference between "relying on
> deterministic sequence from an RNG" and "principle of design".
>
> However, I also maintain that the "principle of design" case demands you should
> *NOT* get the same results from RNGs unless you *specifically intend* to be
> "relying on deterministic sequence from an RNG". After all, they *are* supposed
> to be "random number generators". That's their primary function.
>
> I do believe strongly in the same principles of encapsulation that you do. But
> the thing is, the usual rules of proper encapsulated design are deliberately and
> fundamentally predicated on the assumption that you *want* reliably predictable
> deterministic behavior. The vast majority of the time, that's a correct
> assumption - but the whole *point* of an RNG is to *not* be deterministic. Or at
> least to get as close to non-determinism as reasonably possible within a
> fundamentally deterministic system.
>
> Since the basic goals of RNGs are in fundamental disagreement with the
> underlying assumptions of usual "well encapsulated" design, that calls into
> question the applicability of usual encapsulation rules to RNGs.
>
> At the risk of being presumptuous, I think you do agree on some level. Isn't
> that the whole reason you've proposed that RNGs change from structs to classes?
> Because you don't want identical sequences from RNGs unless *specifically*
> intended, right? The only parts of that I've intended to question are:

I completely agree that _unintended_ identical RNG sequences are a very bad 
thing that must be avoided.  I think that we're in disagreement only about how 
best to achieve that in a sane/safe/uncomplicated way.  With luck we will find 
something that works for both of us :-)

What I feel is that static internal variables go too far in the opposite 
direction: they make it impossible to have even _intentionally_ identical RNG 
sequences.  There's no way, for example, to .save or .dup an RNG instance with 
this design.

However, the effect of a common and reliable source of randomness, used by all 
parts of the program, can be implemented instead by a thread-global singleton 
instance of an RNG as described above.  This seems to me to be a better solution 
as it gives you what you achieve using static variables, but without preventing 
you from instantiating as many other independent RNG instances as you like.

> A. Does that really necessitate moving to classes?

No, the important distinction is value vs. reference types, not structs v. 
classes.  One could implement the RNGs as struct-based reference types.

I wrote a class-based implementation because I wanted to see how that came out 
to various struct-based approaches monarch_dodra and I had tried.  I do think 
that it offers much value in terms of simplicity and elegance of design, but 
there may be other costs that make it untenable.  I am not sure the correct 
choice will be apparent until the current work on memory allocation settles down.

> B. Are there any legitimate use-cases for "specifically relying on deterministic
> sequence from an RNG"?

Well, when I was writing large-scale scientific simulations, it was certainly 
useful to be able to re-run them with the same random sequence.  Given the 
number of random variates I was generating in the process, it was not really 
practical to store the sequence on memory or disk.  The same is true of many 
circumstances where one is writing a program that relies on randomness, but 
wants to test it reliably.

However, fundamentally this comes down to a design point.  Pseudo-random RNGs 
are _always_ deterministic in their operation once seeded.  The difference we're 
really talking about here is whether the source of randomness used by different 
parts of a program is thread-global or not.

And so, the question becomes rather, "Is it legitimate to enforce a 
thread-global source of randomness in all circumstances?"  I'd say the answer to 
that is a definite no.

Bear in mind that one nasty consequence of using static internal variables to 
force thread-global behaviour is that it means that the presence or absence of 
unittests might affect the actual running program.  Consider:

     struct SomeRNG { private static InternalState _internals; ... }

     unittest
     {
         SomeRNG rng;
         rng.seed(KnownSeed);
         // verify that SomeRNG produces expected sequence of variates
     }

Now when you start your _actual_ program, the internal state of SomeRNG will 
already have been set by the unittests.  Obviously if you are doing things 
properly you will scrub this by re-seeding the RNG at the start of your program 
proper, but the risk is clearly still there.

> For B, the above example of "reducing bandwidth/storage requirements for a large
> set of shared random-generated data" is making me question my former stance of
> "No, there's no legitimate use-case". If B falls (and I suppose it maybe it
> does), then A does too, and classes would indeed be necessary.

Yes, I think that is one very good example of a use-case, which is probably one 
shared between a bunch of different applications in practice (scientific 
simulation, games, probably other stuff too).

As I said before, I don't think it mandates _classes_, but it does mandate 
reference-type RNGs that are properly encapsulated.

> But that's all for non-crypto RNGs. For crypto-RNGs, determinism is *never*
> anything more than a concession forced by the deterministic nature of the
> machines themselves. FWIW, this does mean  crypto-RNGs can go either struct or
> class, since the internal states really should be tied together anyway (in order
> to prevent their *outputs* from being tied together).

What do you think about the global-singleton-instance pattern as a way of 
achieving what you want?

I think I will stop my reply here for now, because I think that your response to 
this question (and the explanations leading up to it) probably determines how I 
will respond to your later remarks.  What are your thoughts?

Best wishes,

     -- Joe