[OT] Algorithm question
Ivan Kazmenko via Digitalmars-d
digitalmars-d at puremagic.com
Mon May 1 07:38:09 PDT 2017
On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
> Given a set A of n elements (let's say it's a random-access
> range of
> size n, where n is relatively large), and a predicate P(x) that
> specifies some subset of A of elements that we're interested
> in, what's
> the best algorithm (in terms of big-O time complexity) for
> selecting a
> random element x satisfying P(x), such that elements that
> satisfy P(x)
> have equal probability of being chosen? (Elements that do not
> satisfy
> P(x) are disregarded.)
I'd like to note here that, if you make use of the same P(x) many
times (instead of different predicates on each call), it makes
sense to spend O(n) time and memory filtering by that predicate
and storing the result, and then answer each query in O(1).
> 3) Permutation walk:
>
> auto r = ... /* elements of A */
> foreach (i; iota(0 .. r.length).randomPermutation) {
> if (P(r[i])) return r[i];
> }
> /* no elements satisfy P(x) */
>
> Advantages: if an element that satisfies P(x) is found
> early, the
> loop will terminate before n iterations. This seems like the
> best of
> both worlds of (1) and (2), except:
>
> Disadvantages: AFAIK, generating a random permutation of
> indices from
> 0 .. n requires at least O(n) time, so any advantage we may
> have had
> seems to be negated.
>
> Is there an algorithm for *incrementally* generating a random
> permutation of indices? If there is, we could use that in (3)
> and thus achieve the best of both worlds between early
> termination if an element satisfying P(x) is found, and
> guaranteeing termination after n iterations if no elements
> satisfying P(x) exists.
Yes, there is.
There are actually two variations of Fisher-Yates shuffle:
(https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle)
1.
auto p = n.iota.array;
foreach (pos; 0..n) {
auto otherPos = uniform (0, pos + 1);
swap (p[pos], p[otherPos]);
}
When we look at this after k-th iteration, the first k elements
are randomly and uniformly permuted, and the rest (n-k) are left
untouched.
2.
auto p = n.iota.array;
foreach (pos; 0..n) {
auto otherPos = uniform (pos, n);
swap (p[pos], p[otherPos]);
}
When we look at this after k-th iteration, the first k elements
are a random combination of all n elements, and this combination
is randomly and uniformly permuted. So, the second variation is
what we need: each new element is randomly and uniformly selected
from all the elements left. Once we get the first element
satisfying the predicate, we can just terminate the loop. If
there are m out of n elements satisfying the predicate, the
average number of steps is n/m.
Now, the problem is that both of these allocate n "size_t"-s of
memory to start with. And your problem does not allow to shuffle
the elements of the original array in place, so we do need an
external permutation for these algorithms. However, there are at
least two ways to mitigate that:
(I)
We can allocate the permutation once using n time and memory, and
then, on every call, just reuse it in its current state in n/m
time. It does not matter if the permutation is not identity
permutation: by symmetry argument, any starting permutation will
do just fine.
(II)
We can store the permutation p in an associative array instead of
a regular array, actually storing only the elements accessed at
least once, and assuming other elements to satisfy the identity
p[x] = x. So, if we finish in n/m steps on average, the time and
extra memory used will be O(n/m) too. I can put together an
example implementation if this best satisfies your requirements.
Ivan Kazmenko.
More information about the Digitalmars-d
mailing list