Normal/Gaussian random number generation for D

jerro a at a.com
Fri Oct 26 14:03:11 PDT 2012


> I'm not expert on this, but there are various tests described 
> in this paper which compares different normal random number 
> generation techniques:
> http://www.cse.cuhk.edu.hk/%7Ephwl/mt/public/archives/papers/grng_acmcs07.pdf

I'll implement the tests described in this paper.

> ... and the topic is also discussed in this paper:
> http://www.doornik.com/research/ziggurat.pdf

> I don't have sufficient knowhow to really feel confident about 
> how to test these things though, and part of me feels it might 
> be worth at some point approaching some stats experts and 
> asking them to help define a test suite for D's random number 
> functionality.

Maybe there are testsuites already written for this.

> Well, as I defined the function, the UniformRNG input has a 
> default value of rndGen, so if you call it just as normal(mean, 
> sigma); it should work as intended.  But if there's some factor 
> which means it would be better to define a separate function 
> which doesn't receive the RNG as input, I can do that.

I don't see any downside to this.

> I think it probably depends on the details of the technique.  
> E.g. with Box-Muller you cache 2 random values, and you need to 
> re-generate them for every pair of normal random variates the 
> user requests, so there's no problem with having it initialized 
> in opCall on the first call to the engine.
>
> In general I favour that approach of initializing in opCall 
> because it means that you don't have to ever pass anything to 
> the constructor and also if I have a simulation which uses a 
> lot of random numbers, its output won't be changed simply by 
> creating an instance of a normal random number generator -- 
> only when I start actually _using_ that generator in place of 
> some other source of randomness.

I have only been thinking about the Ziggurat algorithm, but you 
are right, it does depend on the details of the technique. For 
Box-Muller (and other engines that cache samples) it only makes 
sense to compute the first samples in the opCall. But for the 
Ziggurat algorithm, tables that must be computed before you can 
start sampling aren't changed during sampling and computing the 
tables doesn't require any additional arguments. So it makes the 
most sense for those tables to be initialized in the struct's 
constructor in the struct based API.

In my previous post I was talking about initializing static 
instances of the engine used in the normal() function. The 
advantage of initializing in a static constructor is that you 
don't need an additional check every time the normal() function 
is called. But because we will also have a struct based API, that 
will not require such checks (at least not for all engines), this 
isn't really that important. So we can also initialize the global 
engine instance in a call to normal(), if this simplifies things.

> Did you get the code I attached to my original post on this 
> topic?  Is there something extra that you need?

I didn't notice the attachments, sorry (but I see them now). It 
seems it will be easy enough to write NormalZigguratEngine, 
there's just one thing that I think should be added to the Normal 
struct. Engines should have a way to do their initialization in 
Normal's constructor (where possible), so I think something like 
this:

static if(is(typeof(_engine.initialize())))
     _engine.initialize();

should be added to Normal's constructor. Or maybe

_engine = NormalRandomNumberEngine();

so that a static opCall can be used.

> I'm going to turn that into patches for Phobos which I'll put 
> up on GitHub in the next days, so we can pull and push and 
> test/write together as needed.

Maybe we should also have a separate file in a separate branch 
for tests. There will probably be a lot of code needed to test 
this well and the tests could take a long time to run, so I don't 
think they should go into unit test blocks (we can put some 
simpler tests in unit test blocks later). It may also be useful 
to use external libraries such as dstats for tests.


More information about the Digitalmars-d mailing list