Sampling algorithms for D

bearophile bearophileHUGS at lycos.com
Thu Apr 12 12:54:01 PDT 2012


Joseph Rushton Wakeling:

> https://github.com/WebDrake/SampleD

Some comments on the details of your code:

> import std.c.time;

In D there there is also the syntax:
> import std.c.time: foo, bar, baz;

That tells the person that reads the code what names are used.

----------------------

> interface Sampler(UniformRandomNumberGenerator) {

The "UniformRNG" name induces shorter lines in your module and I think it's easy enough to understand.

----------------------

> double uniform_open(UniformRandomNumberGenerator)
> (double a, double b, ref UniformRandomNumberGenerator urng)
> {

In D function/method names are pascalCased. I also suggest to indent the second line, and unless you plan to modifying a and b inside that, to always mark them as "in" (or const, or even immutable):

double uniformOpen(UniformRNG)
                  (double a, double b, ref UniformRNG urng)
{

----------------------

>    do {
>        x = uniform(a,b,urng);
>    } while (x==a);

I suggest to put a space after a comma and before and after an operator:

    do {
        x = uniform(a, b, urng);
    } while (x == a);

----------------------

> class Skipper(UniformRandomNumberGenerator) : Sampler!(UniformRandomNumberGenerator) {

When there is only one template argument you are allowed to omit the (), and generally in English there is no need for a space before a ":":

class Skipper(UniformRNG): Sampler!UniformRNG {

----------------------

>        size_t S = skip(urng);
>
>        sampleRemaining--;
>        recordsRemaining -= (S+1);
>        currentRecord += S;

Unless you plan to modify a variable, then on default define it as immutable, and where that's not possible, as const, this avoids mistakes later, makes the code simpler to understand, and helps the compiler too:

        immutable size_t S = skip(urng);

        sampleRemaining--;
        recordsRemaining -= S + 1;
        currentRecord += S;

----------------------

>     final size_t records_remaining() { return recordsRemaining; }
>     final size_t records_total() { return recordsTotal; }

Generally where possible add "const", "pure" and "nothrow" attributes to functions/methods every where you can (and in some cases @safe too, if you want):

     final size_t records_remaining() const pure nothrow { return recordsRemaining; }
     final size_t records_total() const pure nothrow { return recordsTotal; }

----------------------

> protected:
>     size_t currentRecord = 0;
>     size_t recordsRemaining;

All D variables are initialized (unless you ask them to not be initialized, using "= void"), so the first "= 0" is not needed (but initializing D floating point values to zero is often necessary):

protected:
    size_t currentRecord;
    size_t recordsRemaining;

----------------------

>                 y1 = pow( (uniform_open(0.0,1.0, urng) * (cast(double) recordsRemaining) / qu1),
>                           (1.0/(sampleRemaining - 1)) );

In D there is also the ^^ operator.

----------------------

>                     for( t=recordsRemaining-1; t>=limit; --t)
>                         y2 *= top--/bottom--;

Generally packing mutation of variables inside expressions is quite bad style. It makes code less easy to understand and translate, and currently it's not defined, just as in C (it will be defined, but it will keep being bad style).

                    for (t = recordsRemaining - 1; t >= limit; t--) {
                        y2 *= top / bottom;
                        top--;
                        bottom--;
                    }


Also in D there is (I hope it will not be deprecated) foreach_reverse(), maybe to replace this for().

----------------------

>     writeln("Hello!  I'm a very simple and stupid implementation of some clever");
>     writeln("sampling algorithms.");
>     writeln();
>     writeln("What I'm going to do first is just take a sample of 5 records out of");
>     writeln("100 using each of the algorithms.");
>     writeln();

D string literals are multi-line too:

writeln(
"Hello!  I'm a very simple and stupid implementation of some clever
sampling algorithms.

What I'm going to do first is just take a sample of 5 records out of
100 using each of the algorithms.
");

----------------------

>     sampling_test_simple!(VitterA!Random,Random)(100,5,urng);

Currently your code doesn't work if you want to use a Xorshift generator.

----------------------

>     sampling_test_aggregate!(VitterA!Random,Random)(100,5,urng,10000000,true);
>     writeln();
>     sampling_test_aggregate!(VitterD!Random,Random)(100,5,urng,10000000,true);
>     writeln();

I suggest to define one enum inside the main(), like this, with underscores too, to improve readability, and use it in all the function calls:

    enum int N = 10_000_000;
    samplingTestAggregate!(VitterA!Random, Random)(100, 5, urng, N, true);

Bye,
bearophile


More information about the Digitalmars-d-learn mailing list