Mir Random and Dlang Ranges [Example]

Fri Nov 25 14:52:38 PST 2016

On Friday, 25 November 2016 at 15:56:45 UTC, Ilya Yaroshenko 
wrote:
> What kind of state should not be copied by value? I thought it 
> is only an Engine.

Unfortunately that's not true.  The sampling algorithm pulls some 
tricks to try to reduce the number of calls to the random number 
generator (as well as to avoid calls to some more expensive 
calculations to transform those variates into what is wanted).

The variable in question is here:
https://github.com/dlang/phobos/blob/f6b4e89d64eb998b80a70f23426e0e509e1933d8/std/random.d#L2413

And you can see here the places where regenerating it from the 
RNG may be avoided:
https://github.com/dlang/phobos/blob/f6b4e89d64eb998b80a70f23426e0e509e1933d8/std/random.d#L2699-L2702
https://github.com/dlang/phobos/blob/f6b4e89d64eb998b80a70f23426e0e509e1933d8/std/random.d#L2763

Essentially the sampling algorithm carries out its own little 
internal pseudo-random process which falls back to using the RNG 
when necessary.

Now, in this specific case, it might be possible to avoid that.  
The paper where this algorithm is introduced is more than 30 
years old and it's possible that optimization is no longer very 
relevant on modern hardware with modern RNGs, so it might be fine 
to just generate the value concerned from the RNG with every 
popFront().

However, that would need to be benchmarked; and more generally, 
it doesn't seem to me to be reasonable to assume that other 
random algorithms might not have similar requirements to maintain 
some state associated with an internal pseudo-random process.

> Engines can not be copied by value in Mir Random. If you mean 
> counters, than it is not correct for Dlang. Almost all D 
> counters like ranges are copied by value. A `ref` for 
> randomSample should be used to functions if thay want to modify 
> its counters (the same logic exists in 
> std.experimental.primitives like popFront ).

All of the "No problem" in your remarks comes out of the 
assumption that it's just the RNG whose state must never be 
copied by value.  But even if we can manage to get rid of the 
problematic state from this particular algorithm, I don't think 
that's an assumption that can necessarily be generalized to all 
random algorithms, which may do all sorts of clever state-based 
tricks to optimize their performance.

Can that kind of state also be wrapped up inside a struct that 
can't be copied by value?  Possibly, but then that makes it 
rather more tricky to use RandomSample in a friendly way inside 
extended range chains as described in my previous post.

Stepping back from the specifics: one of my gripes, not with you 
personally, but with the general discussion around this issue is 
that people almost always seem to assume it's all about the RNGs, 
and that if you stop them from being copied by value, then 
everything else will fall out just fine.  It's very rare that 
people ever give consideration to the random algorithms except as 
an afterthought.  That means that the solutions proposed very 
often seem nice and promising when looked at in terms of RNGs, 
and sometimes also in terms of random distributions, but fall 
flat the moment random algorithms come into play.

I've had a lot of conversations with different people around this 
issue, and I've tried out a bunch of different ideas, including 
functor RNGs which would be wrapped with a range API accessing 
them via pointer (a suggestion Dicebot among others made to me).  
In every case, I've run into difficulty when moving from RNGs and 
distributions to random algorithms like RandomCover and 
RandomSample.

Now, that's very possibly just my failure -- it would be great if 
you could identify a solution where I could not.  But the reason 
I described the RandomSample-related problem above was not to get 
assurances that it would not be a problem.  It was meant to 
encourage you to look in detail yourself at this problem, to 
experience its nuances for yourself.  It may not be as 
unproblematic as you think.

> BTW, I am looking for your RandomSample PR! :-)

Well, I would love to give you one, but in this case I do think 
it may be worth your examining the algorithm and its 
implementation in more detail yourself.

Please understand, I'm not trying to be obstructive here.  I 
genuinely think there are factors you have not considered in this 
case, that would be useful for you to look into in greater depth 
before you finalize your designs.  BTW the original research 
papers on the algorithm are here, if you want to take a look:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.329.6400&rep=rep1&type=pdf
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.94.1689&rep=rep1&type=pdf

What I will follow up with is to benchmark the sampling algorithm 
without the optimization trick.  I'll also try to submit a PR for 
the xoroshiro and xorshift* RNGs, and post issues containing some 
of the feedback promised on other aspects of the design.  You may 
have to be a little patient with me, though ... it's a busy time 
right now :-\