Shuffle

Mon Jan 28 11:04:15 PST 2008

Sean Kelly wrote:
> Bill Baxter wrote:
>> It looks like Tango's shuffle doesn't attempt to eliminate the modulo
>> bias in generating a random number.  So it doesn't have attempt to
>> calculate any "n" or "max" to begin with.
> 
> The Tango shuffle uses the Knuth-Fischer-Yates shuffle algorithm but
> iterates from 0 .. N rather than N .. 0.  It may be that reversing the
> direction causes some bias and that it really has to be top-down
> however--I'm planning on reading up on the algorithm today to be sure.
> I actually stumbled on the current implementation by accident simply by
> following the requirements for random_shuffle in the C++ spec.

It really needs to be top-down with that loop body.
The way tango.core.Array.shuffle is implemented[1], position n may be 
swapped with something else after having received its random replacement 
if n comes up as a random number afterwards.
Changing
     for( size_t pos = 2, end = buf.length; pos < buf.length; ++pos )
to
     for( size_t pos = buf.length - 1; pos > 0; --pos )
should fix it.

However, it's possible to implement a correct shuffle with an 
upwards-counting loop. The loop body should then be something like
     buf.swap( n, n + Rand(buf.length - n) );
(assuming 0 <= Rand(N) < N)
so that the random argument is always >= n. This is basically a reversed 
Knuth-Fischer-Yates shuffle.

Both versions would be better if the randomizer eliminated modulo bias, 
though :P.

[1]: At least, according to 
http://www.dsource.org/projects/tango/docs/current/source/tango.core.Array.html