std.random2

Tue Jan 8 10:38:20 PST 2013

On Tuesday, 8 January 2013 at 17:52:24 UTC, Joseph Rushton 
Wakeling wrote:
> Hello all,
>
> Following discussion on the pull request adding normal random 
> number generation to std.random [ 
> https://github.com/D-Programming-Language/phobos/pull/1029 ], 
> some issues were raised which are best discussed with the whole 
> community.
>
> The heart of this is the design of pseudo-random number 
> generators (PRNGs) in Phobos.  Currently these are implemented 
> as value types, which has a number of unpleasant effects:
>
>    -- They're expensive to pass around (some PRNGs have a size 
> of a MB or more)
>
>    -- Passing by value is statistically unsafe, as it can 
> result in identical
>       random sequences being generated in different parts of 
> the code.  This
>       already affects at least one part of std.random itself: 
> see,
>       http://d.puremagic.com/issues/show_bug.cgi?id=8247
>       
> http://forum.dlang.org/thread/oiczxzkzxketxitncghl@forum.dlang.org
>
>    -- Simply passing by reference isn't an adequate solution, 
> as there will be
>       cases (as with RandomSample, detailed in bug 8247) where 
> you have to
>       store the RNG.  Storing e.g. a pointer or reference would 
> be unsafe; the
>       only adequate solution is that PRNGs be (safe) reference 
> types.
>
> monarch_dodra did some work on this which was set aside in the 
> short term because it would (unavoidably) be a breaking change 
> [ https://github.com/D-Programming-Language/phobos/pull/893 ].  
> To avoid this, the proposed solution is to create a std.random2.
>
> However, these issues seem to me to have broader implications 
> for the design of random-number functionality in Phobos, so if 
> we're going to do a std.random2, it's worth mapping them out so 
> as to get them right.
>
> The most obvious (to me) is that these issues which apply to 
> PRNGs apply equally well to random number distributions.  For 
> example the Ziggurat algorithm requires storing several hundred 
> constants, so passing by value is expensive; and several 
> different algorithms generate and store multiple random 
> variates at a time, so copying/passing by value will result in 
> unintended correlations in sequences of variates.
>
> This has further implications if for example we want to create 
> a VariateGenerator (or perhaps in D-ish terms, a VariateRange) 
> which couples a random distribution with a PRNG -- this is 
> unlikely to work unless both the random distribution and the 
> PRNG are reference types.
>
> Finally, there are more general issues about how new 
> functionality should be implemented.  C++11 is given as a model 
> in the std.random documentation, but this is clearly a guide 
> rather than something to copy blindly -- existing functions and 
> structs already deviate from it in ways that reflect D's 
> preference for ranges and its superior generics.  We need a 
> clear picture of how to do this for functionality that has not 
> yet been implemented.
>
> For example: in the case of random distributions, the current 
> example of uniform() offers only a function interface and 
> therefore little guidance about how to create struct 
> implementations for more complex algorithms which require 
> persistent storage (e.g. Ziggurat or Box-Muller).  Should they 
> follow C++11/Boost.Random in returning variates via opCall, or 
> should they be coupled with a PRNG at construction time (as 
> with RandomSample) and implement a range of variates?
>
> My inclination here is to take some time to map out the 
> different interface/design options and to present the choices 
> to the community for review as a precursor to creating a 
> std.random2.  It seems the only really sensible choice to 
> ensure that we get a good future-proof design.
>
> What does everyone think?
>
> Best wishes,
>
>     -- Joe

I imagine there has been some detailed discussion of the 
std.nameX idea of libraries so forgive me if this has been 
discussed. Using this as an approach to essentially replacing 
libraries instead of the depreciation route would seem to 
potentially lead to a situation where you have std.name1, 2, 3, 4 
ad infinitum which may not have exactly the same feature set or 
use cases and would lead to people hunting around multiple 
libraries which on the face of it are for the same thing and 
using them in projects. Would it not be better to bite the bullet 
and depreciate? random2 simply sounds like random done properly, 
with random and random2 people would often use random, and so 
still suffer the consequences of its issues, and possibly 
overlook random2. You're depreciating because it's broken, not 
because people want to mess around with trivialities.