Mir Random and Dlang Ranges [Example]
Joseph Rushton Wakeling via Digitalmars-d
digitalmars-d at puremagic.com
Fri Nov 25 14:52:38 PST 2016
On Friday, 25 November 2016 at 15:56:45 UTC, Ilya Yaroshenko
wrote:
> What kind of state should not be copied by value? I thought it
> is only an Engine.
Unfortunately that's not true. The sampling algorithm pulls some
tricks to try to reduce the number of calls to the random number
generator (as well as to avoid calls to some more expensive
calculations to transform those variates into what is wanted).
The variable in question is here:
https://github.com/dlang/phobos/blob/f6b4e89d64eb998b80a70f23426e0e509e1933d8/std/random.d#L2413
And you can see here the places where regenerating it from the
RNG may be avoided:
https://github.com/dlang/phobos/blob/f6b4e89d64eb998b80a70f23426e0e509e1933d8/std/random.d#L2699-L2702
https://github.com/dlang/phobos/blob/f6b4e89d64eb998b80a70f23426e0e509e1933d8/std/random.d#L2763
Essentially the sampling algorithm carries out its own little
internal pseudo-random process which falls back to using the RNG
when necessary.
Now, in this specific case, it might be possible to avoid that.
The paper where this algorithm is introduced is more than 30
years old and it's possible that optimization is no longer very
relevant on modern hardware with modern RNGs, so it might be fine
to just generate the value concerned from the RNG with every
popFront().
However, that would need to be benchmarked; and more generally,
it doesn't seem to me to be reasonable to assume that other
random algorithms might not have similar requirements to maintain
some state associated with an internal pseudo-random process.
> Engines can not be copied by value in Mir Random. If you mean
> counters, than it is not correct for Dlang. Almost all D
> counters like ranges are copied by value. A `ref` for
> randomSample should be used to functions if thay want to modify
> its counters (the same logic exists in
> std.experimental.primitives like popFront ).
All of the "No problem" in your remarks comes out of the
assumption that it's just the RNG whose state must never be
copied by value. But even if we can manage to get rid of the
problematic state from this particular algorithm, I don't think
that's an assumption that can necessarily be generalized to all
random algorithms, which may do all sorts of clever state-based
tricks to optimize their performance.
Can that kind of state also be wrapped up inside a struct that
can't be copied by value? Possibly, but then that makes it
rather more tricky to use RandomSample in a friendly way inside
extended range chains as described in my previous post.
Stepping back from the specifics: one of my gripes, not with you
personally, but with the general discussion around this issue is
that people almost always seem to assume it's all about the RNGs,
and that if you stop them from being copied by value, then
everything else will fall out just fine. It's very rare that
people ever give consideration to the random algorithms except as
an afterthought. That means that the solutions proposed very
often seem nice and promising when looked at in terms of RNGs,
and sometimes also in terms of random distributions, but fall
flat the moment random algorithms come into play.
I've had a lot of conversations with different people around this
issue, and I've tried out a bunch of different ideas, including
functor RNGs which would be wrapped with a range API accessing
them via pointer (a suggestion Dicebot among others made to me).
In every case, I've run into difficulty when moving from RNGs and
distributions to random algorithms like RandomCover and
RandomSample.
Now, that's very possibly just my failure -- it would be great if
you could identify a solution where I could not. But the reason
I described the RandomSample-related problem above was not to get
assurances that it would not be a problem. It was meant to
encourage you to look in detail yourself at this problem, to
experience its nuances for yourself. It may not be as
unproblematic as you think.
> BTW, I am looking for your RandomSample PR! :-)
Well, I would love to give you one, but in this case I do think
it may be worth your examining the algorithm and its
implementation in more detail yourself.
Please understand, I'm not trying to be obstructive here. I
genuinely think there are factors you have not considered in this
case, that would be useful for you to look into in greater depth
before you finalize your designs. BTW the original research
papers on the algorithm are here, if you want to take a look:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.329.6400&rep=rep1&type=pdf
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.94.1689&rep=rep1&type=pdf
What I will follow up with is to benchmark the sampling algorithm
without the optimization trick. I'll also try to submit a PR for
the xoroshiro and xorshift* RNGs, and post issues containing some
of the feedback promised on other aspects of the design. You may
have to be a little patient with me, though ... it's a busy time
right now :-\
More information about the Digitalmars-d
mailing list