Shuffle

Frits van Bommel fvbommel at REMwOVExCAPSs.nl
Fri Jan 25 07:02:10 PST 2008


Walter Bright wrote:
>     /* Shuffle the files[] array
>      */
>     for (size_t i = 0; i < files.length; i++)
>     {
>     auto j = std.random.rand() % files.length;
>     // swap [i] and [j]
>     auto tmp = files[i];
>     files[i] = files[j];
>     files[j] = tmp;
>     }
> 
>     /* Sequentially fill the target until done or it quits with an
>      * exception when the device is full.
>      */
>     foreach (fromfile; files)
>     {
>     auto tofile = std.path.join(todir, basename(fromfile));
>     writefln("%s => %s", fromfile, tofile);
>     std.file.copy(fromfile, tofile);
>     }
> 
>     writefln("Done");
>     return 0;
> }

As mentioned by Oskar, this won't generate uniformly random lists.
However, both your algorithm and his version have an additional flaw: 
both shuffle the entire list when you only need a random prefix (i.e. 
you only need to select as many random files as will fit onto your 
second disk).

Try this small modification to the original algorithm instead:
-----

     /* The loop will normally quit via an exception when the target
      * device is full. But if there are not enough files in the source
      * to fill up the target, the loop ensures it will still eventually
      * quit.
      */
     while (files.length > 0)
     {
         auto j = std.random.rand() % files.length;
         auto fromfile = files[j];
         auto tofile = std.path.join(todir, basename(fromfile));
         writefln("%s => %s", fromfile, tofile);
         std.file.copy(fromfile, tofile);

         // Remove copied file from the list
         files[j] = files[$-1];
         files = files[0 .. $-1];
     }
-----

Though technically, this will still prefer the files at the front of the 
list during each iteration because (rand() % files.length) is more 
likely to be a number <= (RAND_MAX/files.length) than a larger number. 
(In fact, for RAND_MAX < files.length the higher-ups will *never* be 
chosen during that iteration)
However, this shouldn't be a significant problem as long as your number 
of files is much less than RAND_MAX; and RAND_MAX appears to be uint.max 
for std.random, so for any reasonable music collection it should be 
close enough.

To fix that, ignore the result of rand() if it's > ((RAND_MAX / 
files.length) * files.length), for example by replacing the first line 
of the loop with:
---
     uint r = std.random.max;
     // assumes RAND_MAX == uint.max
     auto max = (uint.max / files.length) * files.length;
     while (r > max)
         r = std.random.rand();
     auto j = r % files.length;
---
(also, add "assert(files.length <= RAND_MAX);" before the loop if 
RAND_MAX < uint.max)


More information about the Digitalmars-d-announce mailing list