What is the use case of RandomCover?

Tue Feb 19 04:22:25 PST 2013

On Tuesday, 19 February 2013 at 11:46:54 UTC, Ivan Kazmenko wrote:
> Hi!
>
> Thank you for the reply.
>
>> Hum... randomShuffle and randomSample actually have nothing to 
>> do with each other.
>> <snip>
>
> I'd like to note that my post is about randomCover, not 
> randomSample.  I do see the difference between the purpose of 
> randomSample and randomShuffle.  But randomCover's effect is, 
> at the first glance, just a slower version of randomSample 
> wrapped as a lazy generator.

Hum... sorry about that.

>> I also want to comment on your "randomSample" vs 
>> "randomSuffle" implementation suggestion. Keep in mind that:
>> a) randomSample doesn't allocate, whereas yours suggestion 
>> doesn't
>> b) randomSample gives direct access to the elements, whereas 
>> your suggestion doesn't.
>>
>> If you don't care about a) and b), then by all means, dup 
>> away, and get better performance!
>>
>> But think about the fact that you wouldn't be able to do 
>> something like this...
>> <snip>
>>        auto sample = randomSample(arr[], 5);
>>        foreach(ref a; sample)
>>            ++a;
>
> That stands for randomCover, too.  Well, thank you, perhaps 
> that's the difference I was seeking.
>
> If this is the intended difference, well, my proposition to 
> enhance randomCover's performance and usefulness transforms 
> into:
>
> 1. Document the usage of randomCover with an example such as 
> above, and refer to randomShuffle as a faster version for 
> simpler use cases.
>
> 2. Optimize the performance by putting Fenwick trees to good 
> use.
>  Currently, randomCover'ing 10,000 elements takes time on the 
> order of one second, and for 100,000 or more elements, it is 
> hardly usable.

Extra documentation never hurts, but keep in mind that a ton of 
algorithms in phobos are lazy and operate this way. Usually, only 
the lazy version is implemented, as the aggressive version is 
trivial (as you suggested).

AFAIK, most of the ranges in std.range are lazy (though not 
obviously) in one way or another.

>> Last but not least, be warned there is an old-standing bug 
>> with anything in phobos that takes a PRNG by value. Basically, 
>> the PRNG will be duplicated, and generate the same sequence 
>> over and over. Ergo, do NOT pass a specific random generator 
>> to things like randomSample or randomSuffle.
>>
>> This problem is one of the major reason we are currently (and 
>> slowly) re-designing random into random2.
>
> So, there is a general agreement that in random2, RNG should by 
> default get passed by reference everywhere?  That's nice to 
> hear.
>
> -----
> Ivan Kazmenko.

The agreement is rather to make them reference types, so even 
when passed by value, you won't accidentally duplicate them. This 
is important as you sometimes pass random ranges to algorithms 
that have nothing to do with randomness.

I'd say the "textbook" example would be:

//----
import std.random;
import std.algorithm;
import std.stdio;

void main()
{
     uint[5] arr1;
     arr1[].fill(rndGen);
     uint[5] arr2;
     arr2[].fill(rndGen);
     writeln("arr1: ", arr1[]);
     writeln("arr1: ", arr2[]);
}
//----
arr1: [3622200385, 2579361262, 3208046123, 1753759120, 133131992]
arr2: [3622200385, 2579361262, 3208046123, 1753759120, 133131992]
//----
Oops!