randomShuffle

Tue Jun 4 01:30:57 PDT 2013

On Monday, 3 June 2013 at 21:24:50 UTC, Joseph Rushton Wakeling 
wrote:
> On 06/03/2013 08:28 PM, Diggory wrote:
>> I'd guess that the heavy use of floating point arithmetic to 
>> calculate the step
>> sizes means that algorithm has a fairly large constant 
>> overhead even though the
>> complexity is smaller.
>
> Yes, I agree.  There might be some optimizations that could be 
> done there.

Well I've come up with this new algorithm:

uint[] randomSample(uint N, uint M) {
	uint[] result = new uint[N];

	struct hashPair {
		uint key;
		uint index;
	}

	size_t tableSize = N*4;
	if (tableSize > M)
		tableSize = M;

	hashPair[] table = new hashPair[tableSize];

	for (uint i = 0; i < N; ++i) {
		uint v = (rndGen.front % (M-i))+i;
		uint newv = v;
		rndGen.popFront();

		uint vhash = v%tableSize;

		while (table[vhash].index) {
			if (table[vhash].key == v) {
				newv = table[vhash].index-1;
				table[vhash].index = i+1;
				goto done;
			}

			vhash = (vhash+1)%tableSize;
		}

		table[vhash].key = v;
		table[vhash].index = i+1;

done:
		result[i] = newv;
	}

	return result;
}

It's O(N) rather than O(N²). Conceptually it works by randomly 
shuffling the first N items in an array of size M which takes 
O(N) time but requires an array of size O(M), so to avoid this it 
uses a simple hash table to store just the items which have been 
swapped. Since only O(N) items are being shuffled there can only 
be O(N) swaps and so the hash table can be O(N) in size while 
still offering constant time look-up.