Random sampling in Phobos

Tue Apr 17 09:25:55 PDT 2012

On 17/04/12 17:31, Andrei Alexandrescu wrote:
> Absolutely. We could and should integrate Vitter's algorithm; the only reason
> it's not there yet is because I didn't have the time.

Then I'll get to work on an implementation.

> That is correct. However the distinction is often academic because the cost of
> generating random numbers is lower than the cost of scanning the data, and
> skipping N steps in the data is comparable to the cost of skipping one step N
> times.

Are we assuming here data with forward access, rather than random access?

My view on the whole setup has probably been skewed by the fact that my sampling 
requirement was based on a setup like,

   (1) generate new sample point index value (doesn't require any data scanning,
       just add skip value to last selected index value);

   (2) do something with that index value.

> Actually that's not correct. RandomSample works fine with an input range and
> does not keep it in memory.

Ahh, OK.  I should have anticipated this as the output also works as a range. 
In that case my "I just need the index values" concern is moot, as I can do,

    randomSample(iota(0,N), n);

I'm starting to appreciate more and more why you emphasize so much the cool-ness 
of ranges in your talks on D. :-)

> This would be great, but you'd need to show with benchmarks that the proposed
> implementation does better than the extant one for most cases that matter.

Well, if you can give me an idea of the cases that matter to you, I'll benchmark 
'em. :-)  In any case, I'll get on with the implementation.

Thanks and best wishes,

     -- Joe