randomShuffle
Diggory
diggsey at googlemail.com
Mon Jun 3 10:00:18 PDT 2013
On Monday, 3 June 2013 at 13:18:30 UTC, Joseph Rushton Wakeling
wrote:
> On 06/03/2013 02:30 PM, Joseph Rushton Wakeling wrote:
>> On 06/03/2013 01:29 PM, Diggory wrote:
>>> For small samples from very large ranges an efficient
>>> algorithm would be:
>>>
>>> int[] randomGen(int N, int M) {
>>> if (N == 0)
>>> return [];
>>>
>>> int[] result = randomGen(N-1, M-1);
>>>
>>> int num = rand(M);
>>> foreach (ref int i; result)
>>> if (i >= num)
>>> ++i;
>>>
>>> result ~= num;
>>>
>>> return result;
>>> }
>>
>> Are you sure about the efficiency of that algorithm? Looks
>> like it's be O(M) to me.
>
> Apologies, I misread the algorithm slightly. Your qualifier is
> quite correct --
> when the number of sample points is very small (e.g. 5 out of
> some arbitrary
> very large number), that algorithm is very quick indeed (I ran
> some tests as I
> was curious:-), and can outperform D's randomSample.
>
> It doesn't scale with the size of the input (your value M), but
> with the number
> of sample points n, but that scaling is very bad -- so as the
> sample size grows
> it becomes _much_ worse than randomSample.
>
> Do you have a reference for the algorithm? Off the top of my
> head I don't
> recognize it from any of the texts I've read.
Thanks for testing before dismissing completely :P The way it
returns results can be improved a lot by pre-allocating a range
of the necessary size/using a range passed in.
The complexity is O(N²) where N is the number of samples out of
M. The algorithm is something I just thought of although I've
possibly used it before and I'm sure someone else has thought of
it too. It's quite simple, it effectively says "pick a value from
the range except for this value" recursively but instead of
dividing the range in two, it shifts the top part down first to
make a contiguous range and then shifts the results which are
past the split up one afterward.
I expect the phobos implementation to be something like O(M) in
which case my method would be faster whenever N < sqrt(M) (with
the optimisations)
More information about the Digitalmars-d-learn
mailing list