random cover of a range

Thu Feb 12 12:01:01 PST 2009

"Andrei Alexandrescu" wrote
> Ok, following Leonardo's and Denis' ideas on covering an immutable array 
> in random order, I thought about it and came up with one algorithm, in 
> addition to the obvious one that stores one integer per element (see 
> Denis' code).
>
> Here's the algorithm. It stores one bit per element and takes O(n log n) 
> time to cover a range of length n.
>
> 1. Given a random-access range r of length n, create a bitmap with one bit 
> per element in r. The range is initially all zeros.
>
> 2. At each step, select a random index in the _original_ array, i.e. a 
> random number in 0 .. r.length. Then walk upwards (with wrapping at the 
> end of the range) until you find a bit = 0 in the bitmap. Mark that bit = 
> 1 and return the found element.
>
> 3. The cover ends when all bits in the bitmap are 1.
>
> I did a ballpark complexity estimate by approximating the average number 
> of steps in (2) with n / (n - k), where k is the number of 1s in the 
> bitmap. IIRC the average number of steps to hit a 1 in a binomial 
> distribution with p = (n - k) / n is 1/p. Right? In that case, the total 
> number of steps taken by the algo above is:
>
> N = n / 1 + n / 2 + ... + n / (n - 1) = n * (1 + 1/2 + 1/3 + ... + 1/(n - 
> 1)) < O (n log n)
>
> Makes sense?

OK, but you still have O(n) space requirement (albeit with a much smaller 
constant).

There are some other optimizations you can make.  For instance, finding a 
bit can be optimized somewhat (i.e. if one integer's worth of bits is 
0xffffffff, you can skip 32 bits, and when you find an int which contains 
some 0 bits, you can even use a lookup table for 8 bits to tell you what the 
next bit should be, then you can go 8 bits at a time looking for your empty 
bit).  This might reduce the "search time" to a more acceptable number.  At 
most, the search time for a bit will be n / 32 + 3.

I think the tradeoff might be significant between a larger space requirement 
and an O(n) vs. O(nlgn) algo, but it would be highly dependent on the 
dataset size.

There's one other "obvious" algorithm.. store pointers to the original 
elements, or null if the element hasn't been swapped.  The code would look 
something like:

T [] _origArray;
T *[] _tmparray;
T *_headElem;

private T *getElem(int idx)
{
   if(auto retval  = _tmparray[idx])
      return retval;
   return &_origArray[idx];
}

void next()
{
   int idx = randomInt(0.._tmparray.length);
   _headElem = getElem(idx);
   _tmparray[idx] = getElem(_tmparray.length - 1);
   _tmparray.length = _tmparray.length - 1;
}

Same space requirements as the index generating range, but it may be faster 
in terms of looking up values.

-Steve