randomShuffle
Diggory
diggsey at googlemail.com
Tue Jun 4 01:30:57 PDT 2013
On Monday, 3 June 2013 at 21:24:50 UTC, Joseph Rushton Wakeling
wrote:
> On 06/03/2013 08:28 PM, Diggory wrote:
>> I'd guess that the heavy use of floating point arithmetic to
>> calculate the step
>> sizes means that algorithm has a fairly large constant
>> overhead even though the
>> complexity is smaller.
>
> Yes, I agree. There might be some optimizations that could be
> done there.
Well I've come up with this new algorithm:
uint[] randomSample(uint N, uint M) {
uint[] result = new uint[N];
struct hashPair {
uint key;
uint index;
}
size_t tableSize = N*4;
if (tableSize > M)
tableSize = M;
hashPair[] table = new hashPair[tableSize];
for (uint i = 0; i < N; ++i) {
uint v = (rndGen.front % (M-i))+i;
uint newv = v;
rndGen.popFront();
uint vhash = v%tableSize;
while (table[vhash].index) {
if (table[vhash].key == v) {
newv = table[vhash].index-1;
table[vhash].index = i+1;
goto done;
}
vhash = (vhash+1)%tableSize;
}
table[vhash].key = v;
table[vhash].index = i+1;
done:
result[i] = newv;
}
return result;
}
It's O(N) rather than O(N²). Conceptually it works by randomly
shuffling the first N items in an array of size M which takes
O(N) time but requires an array of size O(M), so to avoid this it
uses a simple hash table to store just the items which have been
swapped. Since only O(N) items are being shuffled there can only
be O(N) swaps and so the hash table can be O(N) in size while
still offering constant time look-up.
More information about the Digitalmars-d-learn
mailing list