randomShuffle

Mon Jun 3 06:18:21 PDT 2013

On 06/03/2013 02:30 PM, Joseph Rushton Wakeling wrote:
> On 06/03/2013 01:29 PM, Diggory wrote:
>> For small samples from very large ranges an efficient algorithm would be:
>>
>> int[] randomGen(int N, int M) {
>>     if (N == 0)
>>         return [];
>>
>>     int[] result = randomGen(N-1, M-1);
>>
>>     int num = rand(M);
>>     foreach (ref int i; result)
>>         if (i >= num)
>>             ++i;
>>
>>     result ~= num;
>>
>>     return result;
>> }
> 
> Are you sure about the efficiency of that algorithm?  Looks like it's be O(M) to me.

Apologies, I misread the algorithm slightly.  Your qualifier is quite correct --
when the number of sample points is very small (e.g. 5 out of some arbitrary
very large number), that algorithm is very quick indeed (I ran some tests as I
was curious:-), and can outperform D's randomSample.

It doesn't scale with the size of the input (your value M), but with the number
of sample points n, but that scaling is very bad -- so as the sample size grows
it becomes _much_ worse than randomSample.

Do you have a reference for the algorithm?  Off the top of my head I don't
recognize it from any of the texts I've read.