What is the use case of RandomCover?

Mon Feb 18 06:35:43 PST 2013

I'm unsure whether I should post into the ".learn" sub-forum or 
some other one in such a case, but still.

I wonder what is the use case of std.random.randomCover when one 
has std.random.randomShuffle.  I was trying to use it just to get 
a random permutation of integers, and found randomCover prior to 
randomShuffle.  However, for the number of elements as low as 
10,000, the delay was already rather surprising, so I searched 
for a faster solution and found randomShuffle does a superior 
job.  And now I wonder: how does one correctly use randomCover?  
Below is a sample test program showing the difference.

-----
import std.array;
import std.random;
import std.range;
import std.stdio;

immutable int MAX_N = 10_000;

int [] fun (int n, ref Random rnd)
{
	auto t = array (iota (MAX_N));
	version (randomCover)
	{
		auto c = randomCover (t, rnd);
	}
	version (randomShuffle)
	{
		auto c = t;
		randomShuffle (c, rnd);
	}

	return array (c);
}

void main ()
{
	auto rnd = Random (123456789);
	writeln (fun (MAX_N, rnd));
	writeln (fun (MAX_N, rnd) == fun (MAX_N, rnd));
}
-----

Here is a comparison:

1. Speed.
+randomShuffle performs O(n) steps and O(n) uniform() calls.
-randomCover performs O(n^2) steps and O(n^2) uniform() calls.

The latter however can (and perhaps should?) be optimized to 
O(n): in the implementation, the line
             auto chooseMe = uniform(0, k, _rnd) == 0;
can be moved outside the foreach loop and store the integer
         auto toPick = uniform(0, k, _rnd);
instead of a bool.  I can try and write the respective patch if 
needed.

2. Size.
-randomShuffle does not allocate anything extra, but modifies the 
range in place, and so requires allocating another range of n 
values if the original range has to be stored too.
+randomCover only allocates an array of n bools.  If that is the 
intended advantage, the implementation would be better off using 
a bit array instead of a bool array, as in this enhancement 
proposition: http://d.puremagic.com/issues/show_bug.cgi?id=2898

3. Laziness.
-randomShuffle just does its job once.
+randomCover produces some sort of a lazy generator instead.  
Still, the generator performs an O(n) computation on each step, 
so the profit is debatable.

4. Convenience.
+randomShuffle called multiple times with the same RNG advances 
the internal state of the RNG and thus produces different 
results.  If one needs the same results, it is still achievable 
by knowingly saving and loading the internal state of the RNG.
-randomCover called multiple times with the same RNG copies the 
RNG each time by value and thus produces the same result.  That 
is not the intended behavior in the majority of use cases I can 
imagine (e.g., generating different random permutations in a 
loop).  This is already the topic of an issue I found: 
http://d.puremagic.com/issues/show_bug.cgi?id=7067

Now, the only case I can think of where randomCover should be 
preferred to randomShuffle is when you have a huge range 
(hundreds of Mb), but you need to iterate only through the first 
few values in randomCover.  Is there any other?

Whether the above is indeed the intended use of randomCover or 
not, I think that the intended use (and a reference to 
randomShuffle for other cases) should be mentioned in the 
documentation along with the time complexity.

More on the topic of optimization, the performance of the whole 
randomCover thing can be optimized to O(n log n) using a Fenwick 
tree or such to popFront in O(log n).  But it will then require 
storing n integers, not n bools, thus losing the advantage of 
having smaller memory requirements than randomShuffle with copy.

-----
Ivan Kazmenko.