Normal/Gaussian random number generation for D

Tue Nov 6 05:02:59 PST 2012

Sorry for delayed response, last week was an unfortunate mix of getting sick and 
then having a bunch of work-related stuff ...

On 10/26/2012 11:03 PM, jerro wrote:
>> Well, as I defined the function, the UniformRNG input has a default value of
>> rndGen, so if you call it just as normal(mean, sigma); it should work as
>> intended.  But if there's some factor which means it would be better to define
>> a separate function which doesn't receive the RNG as input, I can do that.
>
> I don't see any downside to this.

Which "this" do you mean?  My current approach, or the adding of an extra 
separate function? :-)

> I have only been thinking about the Ziggurat algorithm, but you are right, it
> does depend on the details of the technique. For Box-Muller (and other engines
> that cache samples) it only makes sense to compute the first samples in the
> opCall. But for the Ziggurat algorithm, tables that must be computed before you
> can start sampling aren't changed during sampling and computing the tables
> doesn't require any additional arguments. So it makes the most sense for those
> tables to be initialized in the struct's constructor in the struct based API.

So we should assume by default then that the struct's constructor should take an 
RNG as input, to enable it to calculate these first values if it needs to?

> In my previous post I was talking about initializing static instances of the
> engine used in the normal() function. The advantage of initializing in a static
> constructor is that you don't need an additional check every time the normal()
> function is called. But because we will also have a struct based API, that will
> not require such checks (at least not for all engines), this isn't really that
> important. So we can also initialize the global engine instance in a call to
> normal(), if this simplifies things.

I guess my feeling here is that the values generated by an RNG should depend on 
when it is called, and not at all on when it is instantiated.

i.e. if I do something like

     auto nrng = Normal!()(0, 1);
     writeln( uniform(0.0, 1.0) );
     writeln( uniform(0.0, 1.0) );
     writeln( nrng() );
     writeln( nrng() );

I should get the same output as if I do,

     writeln( uniform(0.0, 1.0) );
     writeln( uniform(0.0, 1.0) );
     auto nrng = Normal!()(0, 1);
     writeln( nrng() );
     writeln( nrng() );

You can also think that if I change from e.g.

     auto nrng = Normal!(real, Engine1)(0, 1);
     writeln( uniform(0.0, 1.0) );
     writeln( uniform(0.0, 1.0) );
     writeln( nrng() );
     writeln( nrng() );

to

     auto nrng = Normal!(real, Engine2)(0, 1);
     writeln( uniform(0.0, 1.0) );
     writeln( uniform(0.0, 1.0) );
     writeln( nrng() );
     writeln( nrng() );

... then I would expect to see different results from the normal RNG but 
identical results from uniform().  If the constructor of the normal engine calls 
the RNG, the uniform() results will change, no?

>> I'm going to turn that into patches for Phobos which I'll put up on GitHub in
>> the next days, so we can pull and push and test/write together as needed.
>
> Maybe we should also have a separate file in a separate branch for tests. There
> will probably be a lot of code needed to test this well and the tests could take
> a long time to run, so I don't think they should go into unit test blocks (we
> can put some simpler tests in unit test blocks later). It may also be useful to
> use external libraries such as dstats for tests.

Yes, a random.d test suite probably should be another project.

Regardless of tests, let's focus for now on getting the API right for this case 
of non-uniform-random-number-generator-with-internal-engine, with normal and 
exponential as the initial cases.  We can always label some engines as "use at 
own risk" in the short term!