Normal/Gaussian random number generation for D

Tue Nov 6 10:14:56 PST 2012

>> I don't see any downside to this.
>
> Which "this" do you mean?  My current approach, or the adding 
> of an extra separate function? :-)

Your current approach.

>> I have only been thinking about the Ziggurat algorithm, but 
>> you are right, it
>> does depend on the details of the technique. For Box-Muller 
>> (and other engines
>> that cache samples) it only makes sense to compute the first 
>> samples in the
>> opCall. But for the Ziggurat algorithm, tables that must be 
>> computed before you
>> can start sampling aren't changed during sampling and 
>> computing the tables
>> doesn't require any additional arguments. So it makes the most 
>> sense for those
>> tables to be initialized in the struct's constructor in the 
>> struct based API.

> So we should assume by default then that the struct's 
> constructor should take an RNG as input, to enable it to 
> calculate these first values if it needs to?

No, I think that if the engine defines initialize() function 
(with no parameters), it should be called in the constructor of 
Normal. I don't think the constructor of Normal should take an 
RNG as input. I think what you currently do in normal.d is fine, 
I would just add something like this:

static if(is(typeof(_engine.initialize())))
     _engine.initialize();

to Normal's constructor. Then ZigguratEngine can define 
initialize() and do its initialization there (it doesn't need a 
uniform RNG for initialization) and NormalBoxMullerEngine can 
remain unchanged.

>> In my previous post I was talking about initializing static 
>> instances of the
>> engine used in the normal() function. The advantage of 
>> initializing in a static
>> constructor is that you don't need an additional check every 
>> time the normal()
>> function is called. But because we will also have a struct 
>> based API, that will
>> not require such checks (at least not for all engines), this 
>> isn't really that
>> important. So we can also initialize the global engine 
>> instance in a call to
>> normal(), if this simplifies things.

> I guess my feeling here is that the values generated by an RNG 
> should depend on when it is called, and not at all on when it 
> is instantiated.
>
> i.e. if I do something like
>
>     auto nrng = Normal!()(0, 1);
>     writeln( uniform(0.0, 1.0) );
>     writeln( uniform(0.0, 1.0) );
>     writeln( nrng() );
>     writeln( nrng() );
>
> I should get the same output as if I do,
>
>     writeln( uniform(0.0, 1.0) );
>     writeln( uniform(0.0, 1.0) );
>     auto nrng = Normal!()(0, 1);
>     writeln( nrng() );
>     writeln( nrng() );
>
> You can also think that if I change from e.g.
>
>     auto nrng = Normal!(real, Engine1)(0, 1);
>     writeln( uniform(0.0, 1.0) );
>     writeln( uniform(0.0, 1.0) );
>     writeln( nrng() );
>     writeln( nrng() );
>
> to
>
>     auto nrng = Normal!(real, Engine2)(0, 1);
>     writeln( uniform(0.0, 1.0) );
>     writeln( uniform(0.0, 1.0) );
>     writeln( nrng() );
>     writeln( nrng() );
>
> ... then I would expect to see different results from the 
> normal RNG but identical results from uniform().  If the 
> constructor of the normal engine calls the RNG, the uniform() 
> results will change, no?

I was only talking about the part of initialization that doesn't 
use a RNG. I agree that everything that uses a RNG should be done 
in opCall (or inside a normal() function in the function 
interface). For Box-Muller, I think the approach you currently 
use in NormalBoxMullerEngine is the most reasonable one.
But a Ziggurat engines needs to compute some tables before it can 
start generating samples. It doesn't need a RNG to do that and 
the tables do not change after initialization.

I think it's obvious that that initialization that doesn't need a 
RNG should be done in Normal's constructor for the struct 
interface. What is not so obvious is where the initialization of 
static data that doesn't require a RNG should be done for 
function interface. That initialization can be done in normal() 
function, or it can be done in a static constructor. If you do it 
in normal(), you need to do an extra check on each call to 
normal(). This isn't really a problem as long as the struct's 
opCall and the version of normal() that takes the engine as a 
parameter don't do such redundant checks. Then the users that 
care about the difference in performance can just use one of 
these interfaces.

> Yes, a random.d test suite probably should be another project.
>
> Regardless of tests, let's focus for now on getting the API 
> right for this case of 
> non-uniform-random-number-generator-with-internal-engine, with 
> normal and exponential as the initial cases.

In general I like the API in file normal.d attached to your 
original post. I think the engines should have an option to do 
some initialization in Normal's constructor, though. We could 
achieve that by calling _engine.initialize in Normal's 
constructors, if such method exists. This method would also need 
to be called on the static instance of normal engine used in the 
normal() function. We could add something like this to the first 
version of normal:

static if(is(typeof(engine.initialize())))
{
     static bool isInitialized;
     if(!isInitialized)
         engine.initialize();
}

Another option would be to do this:

struct GlobalEngine(Engine)
{
     Engine instance;

     static this()
     {
         instance.initialize();
     }
}

And then inside the version of normal that doesn't take an engine 
as a parameter:

alias GlobalEngine!NormalRandomNumberEngine E;
return normal(mean, sigma, E.instance, urng);

The users would need to construct their own instance of engine to 
use the function that takes engine as a parameter. So it would 
make sense to add helper functions for creating engine instances.

There's one change that I think would make the API more 
convenient. Normal struct and the engine don't store an instance 
of a RNG , so they don't need to take it as a template parameter. 
We could make opCall methods templates instead. That way the 
users would never need to explicitly specify the type of the RNG.