std.random2

Tue Jan 8 09:52:17 PST 2013

Hello all,

Following discussion on the pull request adding normal random number generation 
to std.random [ https://github.com/D-Programming-Language/phobos/pull/1029 ], 
some issues were raised which are best discussed with the whole community.

The heart of this is the design of pseudo-random number generators (PRNGs) in 
Phobos.  Currently these are implemented as value types, which has a number of 
unpleasant effects:

    -- They're expensive to pass around (some PRNGs have a size of a MB or more)

    -- Passing by value is statistically unsafe, as it can result in identical
       random sequences being generated in different parts of the code.  This
       already affects at least one part of std.random itself: see,
       http://d.puremagic.com/issues/show_bug.cgi?id=8247
       http://forum.dlang.org/thread/oiczxzkzxketxitncghl@forum.dlang.org

    -- Simply passing by reference isn't an adequate solution, as there will be
       cases (as with RandomSample, detailed in bug 8247) where you have to
       store the RNG.  Storing e.g. a pointer or reference would be unsafe; the
       only adequate solution is that PRNGs be (safe) reference types.

monarch_dodra did some work on this which was set aside in the short term 
because it would (unavoidably) be a breaking change [ 
https://github.com/D-Programming-Language/phobos/pull/893 ].  To avoid this, the 
proposed solution is to create a std.random2.

However, these issues seem to me to have broader implications for the design of 
random-number functionality in Phobos, so if we're going to do a std.random2, 
it's worth mapping them out so as to get them right.

The most obvious (to me) is that these issues which apply to PRNGs apply equally 
well to random number distributions.  For example the Ziggurat algorithm 
requires storing several hundred constants, so passing by value is expensive; 
and several different algorithms generate and store multiple random variates at 
a time, so copying/passing by value will result in unintended correlations in 
sequences of variates.

This has further implications if for example we want to create a 
VariateGenerator (or perhaps in D-ish terms, a VariateRange) which couples a 
random distribution with a PRNG -- this is unlikely to work unless both the 
random distribution and the PRNG are reference types.

Finally, there are more general issues about how new functionality should be 
implemented.  C++11 is given as a model in the std.random documentation, but 
this is clearly a guide rather than something to copy blindly -- existing 
functions and structs already deviate from it in ways that reflect D's 
preference for ranges and its superior generics.  We need a clear picture of how 
to do this for functionality that has not yet been implemented.

For example: in the case of random distributions, the current example of 
uniform() offers only a function interface and therefore little guidance about 
how to create struct implementations for more complex algorithms which require 
persistent storage (e.g. Ziggurat or Box-Muller).  Should they follow 
C++11/Boost.Random in returning variates via opCall, or should they be coupled 
with a PRNG at construction time (as with RandomSample) and implement a range of 
variates?

My inclination here is to take some time to map out the different 
interface/design options and to present the choices to the community for review 
as a precursor to creating a std.random2.  It seems the only really sensible 
choice to ensure that we get a good future-proof design.

What does everyone think?

Best wishes,

     -- Joe