[OT] Algorithm question

Sun Apr 30 21:15:35 PDT 2017

I'm been thinking about the following problem, and thought I'd pick the
brains of the bright people around these parts...

Given a set A of n elements (let's say it's a random-access range of
size n, where n is relatively large), and a predicate P(x) that
specifies some subset of A of elements that we're interested in, what's
the best algorithm (in terms of big-O time complexity) for selecting a
random element x satisfying P(x), such that elements that satisfy P(x)
have equal probability of being chosen? (Elements that do not satisfy
P(x) are disregarded.)

Which elements of A satisfy P(x) is not known ahead of time, nor is the
relative proportion of elements that satisfy P(x) or not.

Note that A is considered to be immutable (swapping elements or
otherwise changing A is not allowed).

So far, I have come up with the following algorithms:

1) "Random shooting":

	auto r = ... /* elements of A */
	for (;;) {
		auto i = uniform(0, r.length);
		if (P(r[i])) return r[i];
	}

   Advantages: If a large percentage of elements in A satisfy P(x), then
   the loop will terminate within a small number of iterations; if the
   majority of elements satisfy P(x), this will terminate well below n
   iterations.

   Disadvantages: If only a small percentage of elements satisfy P(x),
   then this loop could take arbitrarily long to terminate, and it will
   not terminate at all if no elements satisfy P(x).

2) "Turtle walk":

	auto r = ... /* elements of A */
	int nSatisfy = 0;
	ElementType!A currentChoice;
	bool found = false;
	foreach (e; r) {
		if (P(e)) {
			if (uniform(0, nSatisfy) == 0)
			{
				currentChoice = e;
				found = true;
			}
			nSatisfy++;
		}
	}
	if (found) return currentChoice;
	else ... /* no elements satisfy P(x) */

   Advantages: If there is any element at all in A that satisfies P(x),
   it will be found. The loop always terminates after n iterations.

   Disadvantages: Even if 99% of elements in A satisfies P(x), we still
   have to traverse the entire data set before we terminate. (We cannot
   terminate before that, otherwise the probability of elements
   satisfying P(x) being selected will not be even.)

3) Permutation walk:

	auto r = ... /* elements of A */
	foreach (i; iota(0 .. r.length).randomPermutation) {
		if (P(r[i])) return r[i];
	}
	/* no elements satisfy P(x) */

   Advantages: if an element that satisfies P(x) is found early, the
   loop will terminate before n iterations. This seems like the best of
   both worlds of (1) and (2), except:

   Disadvantages: AFAIK, generating a random permutation of indices from
   0 .. n requires at least O(n) time, so any advantage we may have had
   seems to be negated.

Is there an algorithm for *incrementally* generating a random
permutation of indices? If there is, we could use that in (3) and thus
achieve the best of both worlds between early termination if an element
satisfying P(x) is found, and guaranteeing termination after n
iterations if no elements satisfying P(x) exists.

T

-- 
The early bird gets the worm. Moral: ewww...