Sampling algorithms for D

Thu Apr 12 14:48:54 PDT 2012

On 12/04/12 21:54, bearophile wrote:
> Some comments on the details of your code:

Thanks ever so much for the extensive review.

>> import std.c.time;
>
> In D there there is also the syntax:
>> import std.c.time: foo, bar, baz;
>
> That tells the person that reads the code what names are used.

Is this advised for all D modules, or just for stuff using the C standard library?

> Generally where possible add "const", "pure" and "nothrow" attributes to functions/methods every where you can (and in some cases @safe too, if you want):
>
>       final size_t records_remaining() const pure nothrow { return recordsRemaining; }
>       final size_t records_total() const pure nothrow { return recordsTotal; }

Whether I can do this would depend on whether I put in place a re-initialization 
function to allow the sampler to be reset with new values, no?

> All D variables are initialized (unless you ask them to not be initialized, using "= void"), so the first "= 0" is not needed (but initializing D floating point values to zero is often necessary):

I want to be _certain_ that value is zero. ;-)

> In D there is also the ^^ operator.

Ahh, another thing I once knew but had forgotten since the last time I looked at D.

>>                      for( t=recordsRemaining-1; t>=limit; --t)
>>                          y2 *= top--/bottom--;
>
> Generally packing mutation of variables inside expressions is quite bad style. It makes code less easy to understand and translate, and currently it's not defined, just as in C (it will be defined, but it will keep being bad style).

OK.  There's one other that should be a point of concern, where I have

    return currentRecord++;

... in one function; I've added a selectedRecord variable to get round this.

       final size_t select(ref UniformRNG urng)
       {
             assert(_recordsRemaining > 0);
             assert(_sampleRemaining > 0);

             immutable size_t S = skip(urng);
             immutable size_t selectedRecord = _currentRecord + S;

             _sampleRemaining--;
             _recordsRemaining -= (S+1);
             _currentRecord += (S+1);

             return selectedRecord;
       }

> Also in D there is (I hope it will not be deprecated) foreach_reverse(), maybe to replace this for().

I'll see what I can do with that.  There may be a better way of expressing this 
anyway -- as is it's pretty much a literal transcription of the Pascal-esque 
pseudo-code in the original research paper describing the algorithm.

>>      sampling_test_simple!(VitterA!Random,Random)(100,5,urng);
>
> Currently your code doesn't work if you want to use a Xorshift generator.

Ack, enabling that is precisely why I bothered templatizing it in the first 
place.  Can you suggest a fix ... ?  I don't understand why as-is it wouldn't work.

> ----------------------
>
>>      sampling_test_aggregate!(VitterA!Random,Random)(100,5,urng,10000000,true);
>>      writeln();
>>      sampling_test_aggregate!(VitterD!Random,Random)(100,5,urng,10000000,true);
>>      writeln();
>
> I suggest to define one enum inside the main(), like this, with underscores too, to improve readability, and use it in all the function calls:
>
>      enum int N = 10_000_000;
>      samplingTestAggregate!(VitterA!Random, Random)(100, 5, urng, N, true);

Fair enough.  I'd forgotten about the underscore notation for large numbers. 
This stuff I'm not too worried about, because it's only a stupid demo thing, not 
the important code ... :-)

Thanks again for the careful review, it's been very useful.

Best wishes,

     -- Joe