isUniformRNG

Mon May 12 11:17:05 PDT 2014

On 5/11/2014 8:16 AM, Joseph Rushton Wakeling via Digitalmars-d wrote:> 
On 11/05/14 05:58, Nick Sabalausky via Digitalmars-d wrote:
 >> The seed doesn't need to be compromised for synchronized RNGs to fubar
 >> security.
 >
 > Before we go onto the detail of discussion, thanks very much for the
 > extensive explanation.  I was slightly worried that my previous email
 > might have come across as dismissive of your (completely understandable)
 > concerns.

Oh, not at all. I've been finding the discussion rather interesting. :)

 > I'm actually quite keen that we can find a mutually agreeable
 > solution :-)
 >

I agree. And I think our stances aren't quite as opposed as they may seem.

 > Sure, but this is a consequence of two things (i) CryptoRng is a value
 > type and (ii) it gets passed by value, not by reference.
 >
 > In your example, obviously one can fix the problem by having the
 > function declared instead as
 >
 >      ubyte[] getRandomJunk(ref CryptoRng rand) { ... }
 >
 > but I suspect you'd say (and I would agree) that this is inadequate:
 > it's relying on programmer virtue to ensure the correct behaviour, and
 > sooner or later someone will forget that "ref".  (It will also not
 > handle other cases, such as an entity that needs to internally store the
 > RNG, as we've discussed many times on the list with reference to e.g.
 > RandomSample or RandomCover.)
 >

Right. Agreed on all points.

 > Obviously one _can_ solve the problem by the internal state variables of
 > the RNG being static, but I'd suggest to you that RNG-as-reference-type
 > (which doesn't necessarily need to mean "class")

Yea, doesn't necessarily mean class, but if it is made a reference type 
then class is likely the best option. For example, I'd typically regard 
struct* in a D API as a code smell.

 > solves that particular
 > problem without the constraints that static internal variables have.
 >

Pretty much agreed, but the only question is whether those constraints 
of using static internals are good/bad/inconsequential:

For non-crypto RNGs:

While I've tended to think the usefulness of a library-provided RNG that 
permits independent-but-identically-seeded instances is small and 
debatable, through this discussion I have become sufficiently convinced 
that they're worth permitting. Besides, even if there weren't need for 
it, the downsides of permitting such a feature (as long as it's not 
accident-prone) are minimal, if not zero.

So I'm fine going the class route (or otherwise reference-based) and 
making internal state per-instance. Or even having a "duplicate this RNG 
with identical state" function, if people want it.

I think we're pretty well agreed on non-crypto RNGs. Your stance is 
convincing here.

For crypto-RNGs:

A crypto-RNG exists for exactly one reason only: To stretch the useful 
lifetime of a limited source of truely non-deterministic 
randomness/entropy (or nearly-true randomness, if that's the best 
available). Because of this, any and all determinism is a concession, 
not a feature (unlike for non-crypto deterministic-RNGs). Even *how* you 
use it deliberately affects the internal state, not just "how many times 
you asked for a value". These things go all-out to throw any wrenches 
they can into any sources of determinism. I was actually quite impressed :)

In fact, the seeding/reseeding is specifically defined to be completely 
internal to the crypto-RNG (at least with Hash_DRBG anyway, probably 
others) - the user *never* provides a seed - which intentionally makes 
it that much harder for an application to use deterministic seeds (which 
would compromise the security implications of the crypto-RNG, and 
therefore defeat the whole point of using a crypto-RNG instead of a 
normal RNG).

All this is because determinism is NOT what a crypto-RNG is for, it's 
exactly what a crypto-RNG is specifically designed to fight against.

What all that implies: A crypto-RNG shouldn't *explicitly* provide a way 
to get different instances with identical internal states, and 
definitely shouldn't let it happen by accident. It's also under 
absolutely no obligation whatsoever for relying on determinism to even 
be possible at all (it would carry certain difficulties anyway).

Luckily though, that doesn't imply anything particularly profound. *If* 
it's even possible to get identical crypto-RNGs at all, then as long you 
have to work to do it (memcopying raw class data, providing a custom 
entropy source that's written to be deterministic, or even using muliple 
threads/processes, etc), then everything's all good.

Therefore, a class or otherwise reference-based approach is fine for 
crypt-RNGs, too.

I think my preference would still be to keep the internal state static 
here though (again, just speaking for crypto-RNGs only). As I've argued, 
the determinism is a non-feature for crypto-RNGs (they deliberately 
fight it every step of the way), and the shared state carries a couple 
entropy-related benefits (Reseeding one, ie accumulating additional 
fresh entropy, benefits all others. And all RNG activity within in the 
thread acts as additional entropy). So I only see upsides, not downsides.

The only potential issue I see: Are there going to be people who are 
looking for determinism in an RNG and intentionally want to use 
something like Hash_DRBG for that. Maybe for the sake of it's usage of 
SHA. They would have to either limit the number of values they generate 
(to avoid the algorithm's internal reseeding) or else provide their own 
entropy source (not too difficult though, since I've recently 
parametrized HashDRBG on that). But I suppose it could be done.

Shoot, now I'm not so certain anymore...

Well, I suppose permitting a crypto-RNG to be used in replacement for a 
simpler deterministic RNG would be perfectly reasonable to permit 
*provided that* it wasn't done in a way that would make it easier for 
the crypto-uses to make the wrong choice of non-global instances. 
Although, I don't know whether that would ever be appropriate anyway, if 
we already have RNGs intended for non-crypto deterministic purposes.

 > This is a very interesting point.  My understanding is that in many
 > (most?) programs, each individual thread does typically operate using a
 > single RNG instance anyway.  The question for me is whether that's a
 > design choice of the developer or something that is an unavoidable
 > outcome of the RNG design.
 >
 > For example, it's readily possible to just instantiate a single RNG
 > (crypto or otherwise, as required) at the start of the program and pass
 > that around: if it's a reference type (as I would argue it should be)
 > then the effects you desire will still take place.
 >

Yea, that is true.

 > Alternatively, one can also implement a thread-global RNG instance which
 > functions will use.  This is already done with rndGen, which is
 > effectively a singleton pattern.  There's no reason why one couldn't
 > also define a cryptoGen which works in a similar way, just using an
 > instance of a cryto RNG where rndGen uses a Mersenne Twister:
 > 
https://github.com/D-Programming-Language/phobos/blob/d2f2d34d52f89dc9854069b13f52523965a7107e/std/random.d#L1161-L1174 

 >
 > 
https://github.com/WebDrake/std.random2/blob/a071d8d182eb28516198bb292a0b45f86f4425fe/std/random2/generator.d#L60-L81 

 >
 >
 > The benefit of doing it this way, as opposed to static internal
 > variables, is that you aren't then constrained to have only one single,
 > effectively thread-global instance of _every_ RNG you create.
 >

Non-crypto RNGs: Yea, that's good, I like it.

Crypto RNGs: Hmm, good question. My concern is that it's something the 
user has to specifically choose to use. It's not what automatically 
happens when you ask for an instance of a specific RNG.

 > You may also recall earlier games like Elite which were able to "store"
 > whole universes on a single 5 1/4" floppy by virtue of auto-generating
 > that universe from a repeatable random sequence.  It was amazing what
 > could be done on the old BBC Micro :-)
 >

Yea, I have a deep appreciation for those 8-bit-era machines. I was an 
Apple-II guy myself. (Apple-IIc).

 > I completely agree that _unintended_ identical RNG sequences are a very
 > bad thing that must be avoided.  I think that we're in disagreement only
 > about how best to achieve that in a sane/safe/uncomplicated way.  With
 > luck we will find something that works for both of us :-)
 >
 > What I feel is that static internal variables go too far in the opposite
 > direction: they make it impossible to have even _intentionally_
 > identical RNG sequences.  There's no way, for example, to .save or .dup
 > an RNG instance with this design.
 >

Fair enough, at least for non-crypto RNGs. For crypto RNGs, I'm thinking 
now that static state can be avoided as long as the design still does a 
sufficient job of steering actual crypto-purpose users away from 
multiple separate instances.

 > However, the effect of a common and reliable source of randomness, used
 > by all parts of the program, can be implemented instead by a
 > thread-global singleton instance of an RNG as described above.  This
 > seems to me to be a better solution as it gives you what you achieve
 > using static variables, but without preventing you from instantiating as
 > many other independent RNG instances as you like.
 >

Yea, it may. For crypto-RNGs though I'd have it give it some more thought.

 >> A. Does that really necessitate moving to classes?
 >
 > No, the important distinction is value vs. reference types, not structs
 > v. classes.  One could implement the RNGs as struct-based reference 
types.

True, although in general I think any reference-based design that 
doesn't use classes requires fairly strong justification for doing so. I 
think we're unlikely to see that happen here. So I'm good with classes 
(for both crypto and non-crypto RNGs) unless some big unexpected issue 
pops up.

 > I wrote a class-based implementation because I wanted to see how that
 > came out to various struct-based approaches monarch_dodra and I had
 > tried.  I do think that it offers much value in terms of simplicity and
 > elegance of design, but there may be other costs that make it
 > untenable.
 >

Have you found any such costs yet, or anything in particular that 
suggests there may be some? Intuitively, I wouldn't think the minor 
amount of (by default) GC heap usage would matter (just as one 
particular aspect of classes).

 >> B. Are there any legitimate use-cases for "specifically relying on
 >> deterministic
 >> sequence from an RNG"?
 >
 > Well, when I was writing large-scale scientific simulations, it was
 > certainly useful to be able to re-run them with the same random
 > sequence.  Given the number of random variates I was generating in the
 > process, it was not really practical to store the sequence on memory or
 > disk.  The same is true of many circumstances where one is writing a
 > program that relies on randomness, but wants to test it reliably.
 >

Fair enough. Between that and games, I'm convinced.

 > And so, the question becomes rather, "Is it legitimate to enforce a
 > thread-global source of randomness in all circumstances?"  I'd say the
 > answer to that is a definite no.
 >

For "...in **all** circumstances", I'm now sold on your answer, too.

What I now think is unclear is: "Is there *some* circumstance where it's 
legitimate to enforce a thread-global source of randomness?"

 >      struct SomeRNG { private static InternalState _internals; ... }
 >
 >      unittest
 >      {
 >          SomeRNG rng;
 >          rng.seed(KnownSeed);
 >          // verify that SomeRNG produces expected sequence of variates
 >      }
 >
 > Now when you start your _actual_ program, the internal state of SomeRNG
 > will already have been set by the unittests.  Obviously if you are doing
 > things properly you will scrub this by re-seeding the RNG at the start
 > of your program proper, but the risk is clearly still there.
 >

Good point.

 > What do you think about the global-singleton-instance pattern as a way
 > of achieving what you want?
 >
 > I think I will stop my reply here for now, because I think that your
 > response to this question (and the explanations leading up to it)
 > probably determines how I will respond to your later remarks.  What are
 > your thoughts?
 >

Again, for non-crypto RNGs, I'm totally with you now.

For crypto-RNGs, I think it's unclear. The question hinges on:

A. "Can we permit deterministic usages without making crypto users more 
likely to have multiple independent instances within a thread? 
(regardless of whether crypto users do it intentionally or unintentionally)"

That question, of course, needs to be balanced with:

B. "Is 'non-crypo deterministic' an inappropriate misuse of crypto-RNGs 
(when we already have RNGs designed for non-crypo deterministic purposes 
anyway?)"

and C. "How much should we even care about B?"

I'm concerned that the global-singleton-instance pattern may be 
insufficient for "A".

What I think I'd like to see is this: For *all* RNGs (crypto and 
non-crypto), a reference-based design where obtaining a common 
thread-global instance is always the implicit default behavior, and 
individual or duplicate instances can always be obtained *explicitly* in 
a way that cannot be misread/misinterpreted. That would succeed at "A" 
and render "B" and "C" non-issues. Plus, I think it would be generally 
appropriate, even for deterministic non-crypto RNGs, anyway.

Also, I don't want to forget the issue of stream interfaces. What do you 
think about including them, but just with a big red "Subject to change 
pending a new std.stream" banner in the docs? I think that's a perfectly 
pragmatic "best of both worlds" compromise. Think it would be 
well/poorly-received?