What is the use case of RandomCover?
Ivan Kazmenko
gassa at mail.ru
Mon Feb 18 06:35:43 PST 2013
I'm unsure whether I should post into the ".learn" sub-forum or
some other one in such a case, but still.
I wonder what is the use case of std.random.randomCover when one
has std.random.randomShuffle. I was trying to use it just to get
a random permutation of integers, and found randomCover prior to
randomShuffle. However, for the number of elements as low as
10,000, the delay was already rather surprising, so I searched
for a faster solution and found randomShuffle does a superior
job. And now I wonder: how does one correctly use randomCover?
Below is a sample test program showing the difference.
-----
import std.array;
import std.random;
import std.range;
import std.stdio;
immutable int MAX_N = 10_000;
int [] fun (int n, ref Random rnd)
{
auto t = array (iota (MAX_N));
version (randomCover)
{
auto c = randomCover (t, rnd);
}
version (randomShuffle)
{
auto c = t;
randomShuffle (c, rnd);
}
return array (c);
}
void main ()
{
auto rnd = Random (123456789);
writeln (fun (MAX_N, rnd));
writeln (fun (MAX_N, rnd) == fun (MAX_N, rnd));
}
-----
Here is a comparison:
1. Speed.
+randomShuffle performs O(n) steps and O(n) uniform() calls.
-randomCover performs O(n^2) steps and O(n^2) uniform() calls.
The latter however can (and perhaps should?) be optimized to
O(n): in the implementation, the line
auto chooseMe = uniform(0, k, _rnd) == 0;
can be moved outside the foreach loop and store the integer
auto toPick = uniform(0, k, _rnd);
instead of a bool. I can try and write the respective patch if
needed.
2. Size.
-randomShuffle does not allocate anything extra, but modifies the
range in place, and so requires allocating another range of n
values if the original range has to be stored too.
+randomCover only allocates an array of n bools. If that is the
intended advantage, the implementation would be better off using
a bit array instead of a bool array, as in this enhancement
proposition: http://d.puremagic.com/issues/show_bug.cgi?id=2898
3. Laziness.
-randomShuffle just does its job once.
+randomCover produces some sort of a lazy generator instead.
Still, the generator performs an O(n) computation on each step,
so the profit is debatable.
4. Convenience.
+randomShuffle called multiple times with the same RNG advances
the internal state of the RNG and thus produces different
results. If one needs the same results, it is still achievable
by knowingly saving and loading the internal state of the RNG.
-randomCover called multiple times with the same RNG copies the
RNG each time by value and thus produces the same result. That
is not the intended behavior in the majority of use cases I can
imagine (e.g., generating different random permutations in a
loop). This is already the topic of an issue I found:
http://d.puremagic.com/issues/show_bug.cgi?id=7067
Now, the only case I can think of where randomCover should be
preferred to randomShuffle is when you have a huge range
(hundreds of Mb), but you need to iterate only through the first
few values in randomCover. Is there any other?
Whether the above is indeed the intended use of randomCover or
not, I think that the intended use (and a reference to
randomShuffle for other cases) should be mentioned in the
documentation along with the time complexity.
More on the topic of optimization, the performance of the whole
randomCover thing can be optimized to O(n log n) using a Fenwick
tree or such to popFront in O(log n). But it will then require
storing n integers, not n bools, thus losing the advantage of
having smaller memory requirements than randomShuffle with copy.
-----
Ivan Kazmenko.
More information about the Digitalmars-d-learn
mailing list