[OT] Algorithm question
H. S. Teoh via Digitalmars-d
digitalmars-d at puremagic.com
Sun Apr 30 21:15:35 PDT 2017
I'm been thinking about the following problem, and thought I'd pick the
brains of the bright people around these parts...
Given a set A of n elements (let's say it's a random-access range of
size n, where n is relatively large), and a predicate P(x) that
specifies some subset of A of elements that we're interested in, what's
the best algorithm (in terms of big-O time complexity) for selecting a
random element x satisfying P(x), such that elements that satisfy P(x)
have equal probability of being chosen? (Elements that do not satisfy
P(x) are disregarded.)
Which elements of A satisfy P(x) is not known ahead of time, nor is the
relative proportion of elements that satisfy P(x) or not.
Note that A is considered to be immutable (swapping elements or
otherwise changing A is not allowed).
So far, I have come up with the following algorithms:
1) "Random shooting":
auto r = ... /* elements of A */
for (;;) {
auto i = uniform(0, r.length);
if (P(r[i])) return r[i];
}
Advantages: If a large percentage of elements in A satisfy P(x), then
the loop will terminate within a small number of iterations; if the
majority of elements satisfy P(x), this will terminate well below n
iterations.
Disadvantages: If only a small percentage of elements satisfy P(x),
then this loop could take arbitrarily long to terminate, and it will
not terminate at all if no elements satisfy P(x).
2) "Turtle walk":
auto r = ... /* elements of A */
int nSatisfy = 0;
ElementType!A currentChoice;
bool found = false;
foreach (e; r) {
if (P(e)) {
if (uniform(0, nSatisfy) == 0)
{
currentChoice = e;
found = true;
}
nSatisfy++;
}
}
if (found) return currentChoice;
else ... /* no elements satisfy P(x) */
Advantages: If there is any element at all in A that satisfies P(x),
it will be found. The loop always terminates after n iterations.
Disadvantages: Even if 99% of elements in A satisfies P(x), we still
have to traverse the entire data set before we terminate. (We cannot
terminate before that, otherwise the probability of elements
satisfying P(x) being selected will not be even.)
3) Permutation walk:
auto r = ... /* elements of A */
foreach (i; iota(0 .. r.length).randomPermutation) {
if (P(r[i])) return r[i];
}
/* no elements satisfy P(x) */
Advantages: if an element that satisfies P(x) is found early, the
loop will terminate before n iterations. This seems like the best of
both worlds of (1) and (2), except:
Disadvantages: AFAIK, generating a random permutation of indices from
0 .. n requires at least O(n) time, so any advantage we may have had
seems to be negated.
Is there an algorithm for *incrementally* generating a random
permutation of indices? If there is, we could use that in (3) and thus
achieve the best of both worlds between early termination if an element
satisfying P(x) is found, and guaranteeing termination after n
iterations if no elements satisfying P(x) exists.
T
--
The early bird gets the worm. Moral: ewww...
More information about the Digitalmars-d
mailing list