Shuffle
Frits van Bommel
fvbommel at REMwOVExCAPSs.nl
Fri Jan 25 07:02:10 PST 2008
Walter Bright wrote:
> /* Shuffle the files[] array
> */
> for (size_t i = 0; i < files.length; i++)
> {
> auto j = std.random.rand() % files.length;
> // swap [i] and [j]
> auto tmp = files[i];
> files[i] = files[j];
> files[j] = tmp;
> }
>
> /* Sequentially fill the target until done or it quits with an
> * exception when the device is full.
> */
> foreach (fromfile; files)
> {
> auto tofile = std.path.join(todir, basename(fromfile));
> writefln("%s => %s", fromfile, tofile);
> std.file.copy(fromfile, tofile);
> }
>
> writefln("Done");
> return 0;
> }
As mentioned by Oskar, this won't generate uniformly random lists.
However, both your algorithm and his version have an additional flaw:
both shuffle the entire list when you only need a random prefix (i.e.
you only need to select as many random files as will fit onto your
second disk).
Try this small modification to the original algorithm instead:
-----
/* The loop will normally quit via an exception when the target
* device is full. But if there are not enough files in the source
* to fill up the target, the loop ensures it will still eventually
* quit.
*/
while (files.length > 0)
{
auto j = std.random.rand() % files.length;
auto fromfile = files[j];
auto tofile = std.path.join(todir, basename(fromfile));
writefln("%s => %s", fromfile, tofile);
std.file.copy(fromfile, tofile);
// Remove copied file from the list
files[j] = files[$-1];
files = files[0 .. $-1];
}
-----
Though technically, this will still prefer the files at the front of the
list during each iteration because (rand() % files.length) is more
likely to be a number <= (RAND_MAX/files.length) than a larger number.
(In fact, for RAND_MAX < files.length the higher-ups will *never* be
chosen during that iteration)
However, this shouldn't be a significant problem as long as your number
of files is much less than RAND_MAX; and RAND_MAX appears to be uint.max
for std.random, so for any reasonable music collection it should be
close enough.
To fix that, ignore the result of rand() if it's > ((RAND_MAX /
files.length) * files.length), for example by replacing the first line
of the loop with:
---
uint r = std.random.max;
// assumes RAND_MAX == uint.max
auto max = (uint.max / files.length) * files.length;
while (r > max)
r = std.random.rand();
auto j = r % files.length;
---
(also, add "assert(files.length <= RAND_MAX);" before the loop if
RAND_MAX < uint.max)
More information about the Digitalmars-d-announce
mailing list