Shuffle

Mon Jan 28 12:05:29 PST 2008

Frits van Bommel wrote:
> Sean Kelly wrote:
>> Bill Baxter wrote:
>>> It looks like Tango's shuffle doesn't attempt to eliminate the modulo
>>> bias in generating a random number.  So it doesn't have attempt to
>>> calculate any "n" or "max" to begin with.
>>
>> The Tango shuffle uses the Knuth-Fischer-Yates shuffle algorithm but
>> iterates from 0 .. N rather than N .. 0.  It may be that reversing the
>> direction causes some bias and that it really has to be top-down
>> however--I'm planning on reading up on the algorithm today to be sure.
>> I actually stumbled on the current implementation by accident simply by
>> following the requirements for random_shuffle in the C++ spec.
> 
> It really needs to be top-down with that loop body.
> The way tango.core.Array.shuffle is implemented[1], position n may be
> swapped with something else after having received its random replacement
> if n comes up as a random number afterwards.

Yup.  I just wasn't sure if this would be a problem, but it looks like
it will be.  I also had an off-by-one error in the limit value being
passed to rand, which actually turned it into Sattolo's algorithm.  Go
figure.

> Both versions would be better if the randomizer eliminated modulo bias,
> though :P.

I just fixed that as well I think.  Let me know if I screwed it up:

http://dsource.org/projects/tango/changeset/3139

(A change to tango.stdc.posix.sys.stat creeped in as well.  Ignore that.)

Sean