[OT] Algorithm question

Mon May 1 07:38:09 PDT 2017

On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
> Given a set A of n elements (let's say it's a random-access 
> range of
> size n, where n is relatively large), and a predicate P(x) that
> specifies some subset of A of elements that we're interested 
> in, what's
> the best algorithm (in terms of big-O time complexity) for 
> selecting a
> random element x satisfying P(x), such that elements that 
> satisfy P(x)
> have equal probability of being chosen? (Elements that do not 
> satisfy
> P(x) are disregarded.)

I'd like to note here that, if you make use of the same P(x) many 
times (instead of different predicates on each call), it makes 
sense to spend O(n) time and memory filtering by that predicate 
and storing the result, and then answer each query in O(1).

> 3) Permutation walk:
>
> 	auto r = ... /* elements of A */
> 	foreach (i; iota(0 .. r.length).randomPermutation) {
> 		if (P(r[i])) return r[i];
> 	}
> 	/* no elements satisfy P(x) */
>
>    Advantages: if an element that satisfies P(x) is found 
> early, the
>    loop will terminate before n iterations. This seems like the 
> best of
>    both worlds of (1) and (2), except:
>
>    Disadvantages: AFAIK, generating a random permutation of 
> indices from
>    0 .. n requires at least O(n) time, so any advantage we may 
> have had
>    seems to be negated.
>
> Is there an algorithm for *incrementally* generating a random 
> permutation of indices? If there is, we could use that in (3) 
> and thus achieve the best of both worlds between early 
> termination if an element satisfying P(x) is found, and 
> guaranteeing termination after n iterations if no elements 
> satisfying P(x) exists.

Yes, there is.

There are actually two variations of Fisher-Yates shuffle:
(https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle)

1.
     auto p = n.iota.array;
     foreach (pos; 0..n) {
         auto otherPos = uniform (0, pos + 1);
         swap (p[pos], p[otherPos]);
     }

When we look at this after k-th iteration, the first k elements 
are randomly and uniformly permuted, and the rest (n-k) are left 
untouched.

2.
     auto p = n.iota.array;
     foreach (pos; 0..n) {
         auto otherPos = uniform (pos, n);
         swap (p[pos], p[otherPos]);
     }

When we look at this after k-th iteration, the first k elements 
are a random combination of all n elements, and this combination 
is randomly and uniformly permuted.  So, the second variation is 
what we need: each new element is randomly and uniformly selected 
from all the elements left.  Once we get the first element 
satisfying the predicate, we can just terminate the loop.  If 
there are m out of n elements satisfying the predicate, the 
average number of steps is n/m.

Now, the problem is that both of these allocate n "size_t"-s of 
memory to start with.  And your problem does not allow to shuffle 
the elements of the original array in place, so we do need an 
external permutation for these algorithms.  However, there are at 
least two ways to mitigate that:

(I)
We can allocate the permutation once using n time and memory, and 
then, on every call, just reuse it in its current state in n/m 
time.  It does not matter if the permutation is not identity 
permutation: by symmetry argument, any starting permutation will 
do just fine.

(II)
We can store the permutation p in an associative array instead of 
a regular array, actually storing only the elements accessed at 
least once, and assuming other elements to satisfy the identity 
p[x] = x.  So, if we finish in n/m steps on average, the time and 
extra memory used will be O(n/m) too.  I can put together an 
example implementation if this best satisfies your requirements.

Ivan Kazmenko.