Shuffle

Walter Bright newshound1 at digitalmars.com
Fri Jan 25 12:05:07 PST 2008


Frits van Bommel wrote:
> As mentioned by Oskar, this won't generate uniformly random lists.
> However, both your algorithm and his version have an additional flaw: 
> both shuffle the entire list when you only need a random prefix (i.e. 
> you only need to select as many random files as will fit onto your 
> second disk).
> 
> Try this small modification to the original algorithm instead:
> -----
> 
>     /* The loop will normally quit via an exception when the target
>      * device is full. But if there are not enough files in the source
>      * to fill up the target, the loop ensures it will still eventually
>      * quit.
>      */
>     while (files.length > 0)
>     {
>         auto j = std.random.rand() % files.length;
>         auto fromfile = files[j];
>         auto tofile = std.path.join(todir, basename(fromfile));
>         writefln("%s => %s", fromfile, tofile);
>         std.file.copy(fromfile, tofile);
> 
>         // Remove copied file from the list
>         files[j] = files[$-1];
>         files = files[0 .. $-1];
>     }
> -----

You're right, but I prefer having the shuffle bit be separate so people 
will see how to do a shuffle.

> Though technically, this will still prefer the files at the front of the 
> list during each iteration because (rand() % files.length) is more 
> likely to be a number <= (RAND_MAX/files.length) than a larger number. 
> (In fact, for RAND_MAX < files.length the higher-ups will *never* be 
> chosen during that iteration)
> However, this shouldn't be a significant problem as long as your number 
> of files is much less than RAND_MAX; and RAND_MAX appears to be uint.max 
> for std.random, so for any reasonable music collection it should be 
> close enough.

You're right again. (This algorithm is obviously harder than it looks!)

> To fix that, ignore the result of rand() if it's > ((RAND_MAX / 
> files.length) * files.length), for example by replacing the first line 
> of the loop with:
> ---
>     uint r = std.random.max;
>     // assumes RAND_MAX == uint.max
>     auto max = (uint.max / files.length) * files.length;
>     while (r > max)
>         r = std.random.rand();
>     auto j = r % files.length;
> ---
> (also, add "assert(files.length <= RAND_MAX);" before the loop if 
> RAND_MAX < uint.max)

I think it's (r >= max).


More information about the Digitalmars-d-announce mailing list