Sampling algorithms for D
Joseph Rushton Wakeling
joseph.wakeling at webdrake.net
Thu Apr 12 14:48:54 PDT 2012
On 12/04/12 21:54, bearophile wrote:
> Some comments on the details of your code:
Thanks ever so much for the extensive review.
>> import std.c.time;
>
> In D there there is also the syntax:
>> import std.c.time: foo, bar, baz;
>
> That tells the person that reads the code what names are used.
Is this advised for all D modules, or just for stuff using the C standard library?
> Generally where possible add "const", "pure" and "nothrow" attributes to functions/methods every where you can (and in some cases @safe too, if you want):
>
> final size_t records_remaining() const pure nothrow { return recordsRemaining; }
> final size_t records_total() const pure nothrow { return recordsTotal; }
Whether I can do this would depend on whether I put in place a re-initialization
function to allow the sampler to be reset with new values, no?
> All D variables are initialized (unless you ask them to not be initialized, using "= void"), so the first "= 0" is not needed (but initializing D floating point values to zero is often necessary):
I want to be _certain_ that value is zero. ;-)
> In D there is also the ^^ operator.
Ahh, another thing I once knew but had forgotten since the last time I looked at D.
>> for( t=recordsRemaining-1; t>=limit; --t)
>> y2 *= top--/bottom--;
>
> Generally packing mutation of variables inside expressions is quite bad style. It makes code less easy to understand and translate, and currently it's not defined, just as in C (it will be defined, but it will keep being bad style).
OK. There's one other that should be a point of concern, where I have
return currentRecord++;
... in one function; I've added a selectedRecord variable to get round this.
final size_t select(ref UniformRNG urng)
{
assert(_recordsRemaining > 0);
assert(_sampleRemaining > 0);
immutable size_t S = skip(urng);
immutable size_t selectedRecord = _currentRecord + S;
_sampleRemaining--;
_recordsRemaining -= (S+1);
_currentRecord += (S+1);
return selectedRecord;
}
> Also in D there is (I hope it will not be deprecated) foreach_reverse(), maybe to replace this for().
I'll see what I can do with that. There may be a better way of expressing this
anyway -- as is it's pretty much a literal transcription of the Pascal-esque
pseudo-code in the original research paper describing the algorithm.
>> sampling_test_simple!(VitterA!Random,Random)(100,5,urng);
>
> Currently your code doesn't work if you want to use a Xorshift generator.
Ack, enabling that is precisely why I bothered templatizing it in the first
place. Can you suggest a fix ... ? I don't understand why as-is it wouldn't work.
> ----------------------
>
>> sampling_test_aggregate!(VitterA!Random,Random)(100,5,urng,10000000,true);
>> writeln();
>> sampling_test_aggregate!(VitterD!Random,Random)(100,5,urng,10000000,true);
>> writeln();
>
> I suggest to define one enum inside the main(), like this, with underscores too, to improve readability, and use it in all the function calls:
>
> enum int N = 10_000_000;
> samplingTestAggregate!(VitterA!Random, Random)(100, 5, urng, N, true);
Fair enough. I'd forgotten about the underscore notation for large numbers.
This stuff I'm not too worried about, because it's only a stupid demo thing, not
the important code ... :-)
Thanks again for the careful review, it's been very useful.
Best wishes,
-- Joe
More information about the Digitalmars-d-learn
mailing list