[OT] Algorithm question

H. S. Teoh via Digitalmars-d digitalmars-d at puremagic.com
Mon May 1 10:20:44 PDT 2017


On Mon, May 01, 2017 at 02:38:09PM +0000, Ivan Kazmenko via Digitalmars-d wrote:
> On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
> > Given a set A of n elements (let's say it's a random-access range of
> > size n, where n is relatively large), and a predicate P(x) that
> > specifies some subset of A of elements that we're interested in,
> > what's the best algorithm (in terms of big-O time complexity) for
> > selecting a random element x satisfying P(x), such that elements
> > that satisfy P(x) have equal probability of being chosen? (Elements
> > that do not satisfy P(x) are disregarded.)
> 
> I'd like to note here that, if you make use of the same P(x) many
> times (instead of different predicates on each call), it makes sense
> to spend O(n) time and memory filtering by that predicate and storing
> the result, and then answer each query in O(1).

Unfortunately, P(x) could be different each time (sorry, should have
stated this explicitly), and A may also change in between calls to this
random selection algorithm.  So filtering would not be a good approach.


[...]
> There are actually two variations of Fisher-Yates shuffle:
> (https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle)
> 
> 1.
>     auto p = n.iota.array;
>     foreach (pos; 0..n) {
>         auto otherPos = uniform (0, pos + 1);
>         swap (p[pos], p[otherPos]);
>     }
> 
> When we look at this after k-th iteration, the first k elements are
> randomly and uniformly permuted, and the rest (n-k) are left
> untouched.
> 
> 2.
>     auto p = n.iota.array;
>     foreach (pos; 0..n) {
>         auto otherPos = uniform (pos, n);
>         swap (p[pos], p[otherPos]);
>     }
> 
> When we look at this after k-th iteration, the first k elements are a
> random combination of all n elements, and this combination is randomly
> and uniformly permuted.  So, the second variation is what we need:
> each new element is randomly and uniformly selected from all the
> elements left.  Once we get the first element satisfying the
> predicate, we can just terminate the loop.  If there are m out of n
> elements satisfying the predicate, the average number of steps is n/m.

But wouldn't creating the array p already be O(n)? Albeit, somewhat
faster than n iterations of the main loop since copying iota ought to be
somewhat cheaper than generating a random number and swapping two
elements.


> Now, the problem is that both of these allocate n "size_t"-s of memory
> to start with.  And your problem does not allow to shuffle the
> elements of the original array in place, so we do need an external
> permutation for these algorithms.  However, there are at least two
> ways to mitigate that:
> 
> (I)
> We can allocate the permutation once using n time and memory, and
> then, on every call, just reuse it in its current state in n/m time.
> It does not matter if the permutation is not identity permutation: by
> symmetry argument, any starting permutation will do just fine.

I like this idea very much. This lets us only pay once for creating the
index array, and reuse it almost "for free" thereafter.

Still, the O(n) space requirement could potentially be onerous, since n
is expected to be rather large.


> (II)
> We can store the permutation p in an associative array instead of a
> regular array, actually storing only the elements accessed at least
> once, and assuming other elements to satisfy the identity p[x] = x.
> So, if we finish in n/m steps on average, the time and extra memory
> used will be O(n/m) too.  I can put together an example implementation
> if this best satisfies your requirements.
[...]

This is interesting, and relaxes the O(n) space requirement initially.
After enough queries, though, I'd expect the AA to eventually grow to n
elements (as there is a chance some queries will end up finding no
elements that satisfy P(x) so it would generate the entire permutation).

But I like your idea of reusing the index array.  It's definitely a
possibility!


T

-- 
Your inconsistency is the only consistent thing about you! -- KD


More information about the Digitalmars-d mailing list