1st draft of complete class-based std.random successor

Wed Mar 19 16:49:38 PDT 2014

Hello all,

As some of you may already know, monarch_dodra and I have spent 
quite a lot of time over the last year discussing the state of 
std.random.  To cut a long story short, there are significant 
problems that arise because the current RNGs are value types 
rather than reference types.  We had quite a lot of back and 
forth on different design ideas, with a lot of helpful input from 
others in the community, but at the end of the day there are 
really only two broad approaches: create structs that implement 
reference semantics internally, or use classes.  So, as an 
exercise, I decided to create a class-based std.random.

The preliminary (but comprehensive) results of this are now 
available here:
https://github.com/WebDrake/std.random2

Besides re-implementing random number generators as classes 
rather than structs, the new code splits std.random2 into a 
package of several different modules:

    * std.random2.generator, pseudo-random number generators;

    * std.random2.device, non-deterministic random sources;

    * std.random2.distribution, random distributions such as 
uniform,
      normal, etc.;

    * std.random2.adaptor, random "adaptors" such as randomShuffle,
      randomSample, etc.

    * std.random2.traits, RNG-specific traits such as isUniformRNG
      and isSeedable.

A package.d file groups them together so one can still import all 
together via "import std.random2".  I've also taken the liberty 
of following the new guideline to place import statements as 
locally as possible; it was striking how easy and clean this made 
things, and it should be easy to port that particular change back 
to std.random.

The new package implements all of the functions, templates and 
range objects from std.random except for the old 
std.random.uniformDistribution, whose name I have cannibalized 
for better purposes.  Some have been updated: the 
MersenneTwisterEngine has been tweaked to match the corresponding 
code from Boost.Random, and this in turn has allowed the 
definition of a 64-bit Mersenne Twister (Mt19937_64) and an 
alternative 32-bit one (Mt11213b).

There are also a number of entirely new entries.  
std.random2.distribution contains not just existing functions 
such as dice and uniform, but also range-based random 
distribution classes UniformDistribution, NormalDistribution and 
DiscreteDistribution; the last of these is effectively a 
range-based version of dice, and is based on Chris Cain's 
excellent work here: 
https://github.com/D-Programming-Language/phobos/pull/1702

The principal weak point in terms of functionality is 
std.random2.device, where the implemented random devices (based 
on Posix' /std/random and /std/urandom) are really very primitive 
and just there to illustrate the principle.  However, since their 
API is pretty simple (they're just input ranges with min and max 
defined) there should be plenty of opportunity to improve and 
extend the internals in future.  Advice and patches are welcome 
for everything, but particularly here :-)

What's become quite apparent in the course of writing this 
package is how much more natural it is for ranges implementing 
randomness to be class objects.  The basic fact that another 
range can store a copy of an RNG internally without creating a 
copy-by-value is merely the start: for example, in the case of 
the class implementation of RandomSample, we no longer need to 
have complications like,

     @property auto ref front()
     {
         assert(!empty);
         // The first sample point must be determined here to avoid
         // having it always correspond to the first element of the
         // input.  The rest of the sample points are determined 
each
         // time we call popFront().
         if (_skip == Skip.None)
         {
             initializeFront();
         }
         return _input.front;
     }

that were necessary to avoid bugs like 
https://d.puremagic.com/issues/show_bug.cgi?id=7936; because the 
class-based implementation copies by reference, we can just 
initialize everything in the constructor.  Similarly, issues like 
https://d.puremagic.com/issues/show_bug.cgi?id=7067 and 
https://d.puremagic.com/issues/show_bug.cgi?id=8247 just vanish.

Obvious caveats about the approach include the fact that classes 
need to be new'd, and questions over whether allocation on the 
heap might create speed issues.  The benchmarks I've run (code 
available in the repo) seem to suggest that at least the latter 
is not a worry, but these are obviously things that need to be 
considered.  My own feeling is that ultimately it is a 
responsibility of the language to offer nice ways to allocate 
classes without necessarily relying on new or the GC.

A few remarks on design and other factors:

    * The new range objects have been implemented as final classes 
for
      speed purposes.  However, I tried another approach where the 
RNG
      class templates were abstract classes, and the individual
      parameterizations were final-class subclasses of those rather
      than aliases.  This was noticeably slower.  My OO-fu is not 
really
      sufficient to explain this, so if anybody can offer a 
reason, I'd
      be happy to learn it.

    * A design question I considered but have not yet pursued: 
since at
      least two functions require passing the RNG as the first 
parameter
      (dice and discreteDistribution), perhaps this should be made 
a
      general design pattern for everything?  It would make it 
harder to
      adapt code using the existing std.random but would create a 
useful
      uniformity.

    * I would have liked to ensure that every random distribution 
had
      both a range- and function-based version.  However, I came 
to the
      conclusion that solely function-based versions should be 
avoided
      if either (i) the function would need to maintain internal 
state
      between calls, or (ii) the function would need to allocate 
memory
      per call.  The first is why for example NormalDistribution 
exists
      only as a class/range.  The second might in principle raise 
some
      objections to dice, but as dice seems to be a reasonably 
standard
      function, I kept it in.

    * It might be good to implement helper functions for the 
individual
      RNGs (e.g. just as RandomSample has a randomSample helper 
function
      to deliver instances, so Mt19937 could have a corresponding
      mt19937 helper function returning Mt19937 instances seeded 
in line
      with helper function parameters).

    * Those with long memories may recall that when I originally 
wrote
      up my NormalDistribution code, it was written to allow 
various
      "normal engines" to be plugged in; mine was Box-Muller, but 
jerro
      also contributed a Ziggurat-based engine.  This could still 
be
      provided here, although my own inclination is that it's 
probably
      best for Phobos to provide one single 
good-for-general-purpose-use
      implementation.

Known issues:

    * While every bugfix I've made in the course of implementing 
this
      package has been propagated back to std.random where 
possible,
      this package is missing some of the more recent improvements 
to
      std.random by other people (e.g. I think it's missing Chris 
Cain's
      update to integer-based uniform()).

    * The unittest coverage is overall pretty damn good, but there 
are
      weak spots in std.random.distribution and std.random2.device.
      Some of the "unittests" in these cases are no more than basic
      developer sanity checks that print results to console, and 
need
      to be replaced by well-defined, silent-unless-failed 
alternatives.

    * Some of the .save functions are implemented with the help of 
rather
      odd private constructors; it would probably be much better 
to redo
      these in terms of public this(typeof(this)) constructors.

    * The random devices _really_ need to be better.  Consider the 
current
      versions as placeholders ... :-)

Finally, a note on authorship: since this is still based very 
substantially on std.random, I've made an effort to check git 
logs and ensure that authors and copyright records (and dates of 
contribution) are correct.  My general principle here has been 
that listed authors should only include those who've made a 
substantial contribution (i.e. whole functions, large numbers of 
unittests, ...), not just various 1-line tweaks.  But if anyone 
has any objection to any of the names, dates or other credits 
given, or if anybody would like their name removed (!), just let 
me know.

I owe a great debt of gratitude to many people here on the 
forums, and monarch_dodra in particular, for a huge amount of 
useful discussion, advice and feedback that has made its way into 
the current code.  Thank you all for your time, thoughts, ideas 
and patience.

Anyway, please feel free to review, destroy and otherwise do fun 
stuff with this module.  I hope that some of you will find it 
immediately useful, but please note that feedback and advice may 
result in breaking changes -- this is intended to wind up in 
Phobos, so it really needs to be perfect when it does so.  Let's 
review it really well and make it happen!

Thanks and best wishes,

     -- Joe