Faster sort?
Andrea Fontana via Digitalmars-d
digitalmars-d at puremagic.com
Wed Apr 6 01:15:39 PDT 2016
I did a couple of tries using a "specialized" implementation of
sort!less.
For example using a counting sort for bool seems a good idea.
This code:
auto boolSort(alias less = "a<b")(in bool[] data)
{
import std.array : uninitializedArray;
import std.functional : binaryFun;
// Check if order is true/false or false/true
immutable test = binaryFun!less(true, false);
size_t count;
foreach(ref x; data)
if (x == test) count++;
return chain(repeat(test,count), repeat(!test, data.length -
count));
}
Count number of false (or true, it depends on result of "less"),
and then chain them.
I did a test over 10000 randomized array of 2,4,8,16...65536
elements.
rdmd -O -release -noboundscheck -inline sort.d :
2 inputs:
Faster: 229 μs and 9 hnsecs
Phobos: 215 μs and 5 hnsecs
...
64 inputs:
Faster: 513 μs and 4 hnsecs
Phobos: 2 ms, 143 μs, and 5 hnsecs
...
65536 inputs:
Faster: 2 secs, 700 ms, and 696 μs
Phobos: 6 secs, 835 ms, and 319 μs
Using ldmd2 -O -release -noboundscheck -inline sort.d && ./sort
instead:
2 inputs:
Faster: 0 hnsecs
Phobos: 33 μs and 5 hnsecs
...
65536 inputs:
Faster: 0 hnsecs (???)
Phobos: 7 secs, 865 ms, 450 μs, and 6 hnsecs
The same thing works for byte[] and ubyte[] array using a few
tricks (eg: using counting sort and then sorting only keys).
So for ubyte[]/byte[] (ldc):
2 inputs:
1 ms, 305 μs, and 4 hnsecs
31 μs and 4 hnsecs
...
256 inputs:
8 ms, 76 μs, and 3 hnsecs
8 ms, 807 μs, and 3 hnsecs
...
65536 inputs:
347 ms, 330 μs, and 9 hnsecs
14 secs, 214 ms, 66 μs, and 5 hnsecs
Anyway implementation for byte/ubyte should be improved as long
as it probably won't work fine if less is a functor or delegate:
auto byteSort(alias less = "a<b", T)(in T[] data)
if (is(T == byte) || is(T == ubyte))
{
import std.range : iota;
import std.algorithm : map;
import std.array : uninitializedArray, array;
// It needs 256*size_t.sizeof bytes. Maybe not suitables for
embedded platforms?
size_t[256] buckets;
foreach(ref x; data)
buckets[cast(ubyte)x]++;
// Sort 256 keys using less function at compile time (less
should work at compile time)
static immutable sortedKeys = iota(0, 256).map!(x =>
cast(T)x).array.sort!less.array;
auto result = uninitializedArray!(T[])(data.length);
size_t currentIdx = 0;
// bucket[0] = 3, bucket[1] = 0, bucket[2] = 1 => 0,0,0,2
foreach(ref k; sortedKeys)
{
auto b = buckets[cast(ubyte)k];
if (b)
{
result[currentIdx .. currentIdx + b] = cast(T)k;
currentIdx += b;
if (currentIdx >= data.length) return result;
}
}
assert(0);
}
Maybe those implementations could be useful to improve phobos
sorting?
Andrea
More information about the Digitalmars-d
mailing list