randomShuffle

Diggory diggsey at googlemail.com
Tue Jun 4 01:39:33 PDT 2013


On Tuesday, 4 June 2013 at 08:30:58 UTC, Diggory wrote:
> On Monday, 3 June 2013 at 21:24:50 UTC, Joseph Rushton Wakeling 
> wrote:
>> On 06/03/2013 08:28 PM, Diggory wrote:
>>> I'd guess that the heavy use of floating point arithmetic to 
>>> calculate the step
>>> sizes means that algorithm has a fairly large constant 
>>> overhead even though the
>>> complexity is smaller.
>>
>> Yes, I agree.  There might be some optimizations that could be 
>> done there.
>
> Well I've come up with this new algorithm:
>
>
> uint[] randomSample(uint N, uint M) {
> 	uint[] result = new uint[N];
>
> 	struct hashPair {
> 		uint key;
> 		uint index;
> 	}
>
> 	size_t tableSize = N*4;
> 	if (tableSize > M)
> 		tableSize = M;
>
> 	hashPair[] table = new hashPair[tableSize];
>
> 	for (uint i = 0; i < N; ++i) {
> 		uint v = (rndGen.front % (M-i))+i;
> 		uint newv = v;
> 		rndGen.popFront();
>
> 		uint vhash = v%tableSize;
>
> 		while (table[vhash].index) {
> 			if (table[vhash].key == v) {
> 				newv = table[vhash].index-1;
> 				table[vhash].index = i+1;
> 				goto done;
> 			}
>
> 			vhash = (vhash+1)%tableSize;
> 		}
>
> 		table[vhash].key = v;
> 		table[vhash].index = i+1;
>
> done:
> 		result[i] = newv;
> 	}
>
> 	return result;
> }
>
>
> It's O(N) rather than O(N²). Conceptually it works by randomly 
> shuffling the first N items in an array of size M which takes 
> O(N) time but requires an array of size O(M), so to avoid this 
> it uses a simple hash table to store just the items which have 
> been swapped. Since only O(N) items are being shuffled there 
> can only be O(N) swaps and so the hash table can be O(N) in 
> size while still offering constant time look-up.

OK, posted too soon! There's a small bug in this one where it 
could potentially return the same value twice, it can be fixed by 
looking up "i" in the hash table as well as "v".


More information about the Digitalmars-d-learn mailing list