randomShuffle

Tue Jun 4 01:39:33 PDT 2013

On Tuesday, 4 June 2013 at 08:30:58 UTC, Diggory wrote:
> On Monday, 3 June 2013 at 21:24:50 UTC, Joseph Rushton Wakeling 
> wrote:
>> On 06/03/2013 08:28 PM, Diggory wrote:
>>> I'd guess that the heavy use of floating point arithmetic to 
>>> calculate the step
>>> sizes means that algorithm has a fairly large constant 
>>> overhead even though the
>>> complexity is smaller.
>>
>> Yes, I agree.  There might be some optimizations that could be 
>> done there.
>
> Well I've come up with this new algorithm:
>
>
> uint[] randomSample(uint N, uint M) {
> 	uint[] result = new uint[N];
>
> 	struct hashPair {
> 		uint key;
> 		uint index;
> 	}
>
> 	size_t tableSize = N*4;
> 	if (tableSize > M)
> 		tableSize = M;
>
> 	hashPair[] table = new hashPair[tableSize];
>
> 	for (uint i = 0; i < N; ++i) {
> 		uint v = (rndGen.front % (M-i))+i;
> 		uint newv = v;
> 		rndGen.popFront();
>
> 		uint vhash = v%tableSize;
>
> 		while (table[vhash].index) {
> 			if (table[vhash].key == v) {
> 				newv = table[vhash].index-1;
> 				table[vhash].index = i+1;
> 				goto done;
> 			}
>
> 			vhash = (vhash+1)%tableSize;
> 		}
>
> 		table[vhash].key = v;
> 		table[vhash].index = i+1;
>
> done:
> 		result[i] = newv;
> 	}
>
> 	return result;
> }
>
>
> It's O(N) rather than O(N²). Conceptually it works by randomly 
> shuffling the first N items in an array of size M which takes 
> O(N) time but requires an array of size O(M), so to avoid this 
> it uses a simple hash table to store just the items which have 
> been swapped. Since only O(N) items are being shuffled there 
> can only be O(N) swaps and so the hash table can be O(N) in 
> size while still offering constant time look-up.

OK, posted too soon! There's a small bug in this one where it 
could potentially return the same value twice, it can be fixed by 
looking up "i" in the hash table as well as "v".