randomShuffle
Diggory
diggsey at googlemail.com
Mon Jun 3 11:28:59 PDT 2013
On Monday, 3 June 2013 at 17:35:22 UTC, Joseph Rushton Wakeling
wrote:
> On 06/03/2013 07:00 PM, Diggory wrote:
>> Thanks for testing before dismissing completely :P The way it
>> returns results
>> can be improved a lot by pre-allocating a range of the
>> necessary size/using a
>> range passed in.
>
> Surely. :-)
>
>> The complexity is O(N²) where N is the number of samples out
>> of M. The algorithm
>> is something I just thought of although I've possibly used it
>> before and I'm
>> sure someone else has thought of it too. It's quite simple, it
>> effectively says
>> "pick a value from the range except for this value"
>> recursively but instead of
>> dividing the range in two, it shifts the top part down first
>> to make a
>> contiguous range and then shifts the results which are past
>> the split up one
>> afterward.
>
> It's an interesting little piece of invention. I'll have to
> look around and see
> if there's anything similar documented. I've not seen anything
> in the
> literature which has this particular property of O(N) random
> variates plus
> O(N^2) calculation.
>
> The fact that it's not sequential might have been a bit of an
> issue
> historically, which maybe explains why I haven't come across
> something like it.
>
>> I expect the phobos implementation to be something like O(M)
>> in which case my
>> method would be faster whenever N < sqrt(M) (with the
>> optimisations)
>
> No, the Phobos implementation is O(N) -- I wrote it :-) See:
> http://braingam.es/2012/07/sampling-d/
>
> The API is Andrei's -- the original implementation used what
> Knuth refers to as
> Algorithm S [which is really simple and easy to implement and
> check, but is
> indeed O(M)], I updated it to use Algorithm D (which is
> complicated:-).
>
> So, since randomSample takes kN time where k is constant, your
> algorithm will
> probably win where N < k.
>
> I found that the sampling time was about equal with N = 50 and
> your algorithm as
> it's currently written.
I'd guess that the heavy use of floating point arithmetic to
calculate the step sizes means that algorithm has a fairly large
constant overhead even though the complexity is smaller.
More information about the Digitalmars-d-learn
mailing list