[OT] Algorithm question

Mon May 1 01:42:05 PDT 2017

On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
> I'm been thinking about the following problem, and thought I'd 
> pick the brains of the bright people around these parts...
>
> Given a set A of n elements (let's say it's a random-access 
> range of
> size n, where n is relatively large), and a predicate P(x) that
> specifies some subset of A of elements that we're interested 
> in, what's
> the best algorithm (in terms of big-O time complexity) for 
> selecting a
> random element x satisfying P(x), such that elements that 
> satisfy P(x)
> have equal probability of being chosen? (Elements that do not 
> satisfy
> P(x) are disregarded.)
>
> Which elements of A satisfy P(x) is not known ahead of time, 
> nor is the
> relative proportion of elements that satisfy P(x) or not.
>
> Note that A is considered to be immutable (swapping elements or
> otherwise changing A is not allowed).
>
> So far, I have come up with the following algorithms:
>
> 1) "Random shooting":
>
> 	auto r = ... /* elements of A */
> 	for (;;) {
> 		auto i = uniform(0, r.length);
> 		if (P(r[i])) return r[i];
> 	}
>
>    Advantages: If a large percentage of elements in A satisfy 
> P(x), then
>    the loop will terminate within a small number of iterations; 
> if the
>    majority of elements satisfy P(x), this will terminate well 
> below n
>    iterations.
>
>    Disadvantages: If only a small percentage of elements 
> satisfy P(x),
>    then this loop could take arbitrarily long to terminate, and 
> it will
>    not terminate at all if no elements satisfy P(x).

You can recover O(n) calls to P by caching the results. You can 
recover O(n) for both P and rng by progressively removing 
elements from r or any other method that results in an easily 
samplable set of all elements of r not yet visited (and therefore 
possible candidates).

I like this one (warning, just thought of, untested):

auto r = ... /* elements of A */
auto nRemaining = r.length;
while (nRemaining) {
	auto i = uniform(0, nRemaining);
	if (P(r[i])) return r[i];
	else swap(r[i], r[--nRemaining]);
}

> 3) Permutation walk:
>
> 	auto r = ... /* elements of A */
> 	foreach (i; iota(0 .. r.length).randomPermutation) {
> 		if (P(r[i])) return r[i];
> 	}
> 	/* no elements satisfy P(x) */
>
>    Advantages: if an element that satisfies P(x) is found 
> early, the
>    loop will terminate before n iterations. This seems like the 
> best of
>    both worlds of (1) and (2), except:
>
>    Disadvantages: AFAIK, generating a random permutation of 
> indices from
>    0 .. n requires at least O(n) time, so any advantage we may 
> have had
>    seems to be negated.
>
> Is there an algorithm for *incrementally* generating a random 
> permutation of indices? If there is, we could use that in (3) 
> and thus achieve the best of both worlds between early 
> termination if an element satisfying P(x) is found, and 
> guaranteeing termination after n iterations if no elements 
> satisfying P(x) exists.
>
>
> T

Yes. As a matter of fact it would be the logical extension of my 
algorithm above for when you can't or don't want to change r 
(again, only just thought of this, untested...):

auto r = ... /* elements of A */
auto indices = iota(r.length).array;
auto nRemaining = r.length;
while (nRemaining) {
	auto i = uniform(0, nRemaining);
	if (P(r[indices[i]])) return r[indices[i]];
	else swap(indices[i], indices[--nRemaining]);
}

You could also do the same thing but with an array of pointers to 
elements of r instead of indices.