Faster sort?

Andrea Fontana via Digitalmars-d digitalmars-d at puremagic.com
Wed Apr 6 01:15:39 PDT 2016


I did a couple of tries using a "specialized" implementation of 
sort!less.

For example using a counting sort for bool seems a good idea. 
This code:

auto boolSort(alias less = "a<b")(in bool[] data)
{
    import std.array        : uninitializedArray;
    import std.functional   : binaryFun;

    // Check if order is true/false or false/true
    immutable test = binaryFun!less(true, false);
    size_t count;

    foreach(ref x; data)
       if (x == test) count++;

    return chain(repeat(test,count), repeat(!test, data.length - 
count));
}

Count number of false (or true, it depends on result of "less"), 
and then chain them.

I did a test over 10000 randomized array of 2,4,8,16...65536 
elements.

rdmd -O -release -noboundscheck -inline sort.d :

2 inputs:
Faster: 229 μs and 9 hnsecs
Phobos: 215 μs and 5 hnsecs

...

64 inputs:
Faster: 513 μs and 4 hnsecs
Phobos: 2 ms, 143 μs, and 5 hnsecs

...

65536 inputs:
Faster: 2 secs, 700 ms, and 696 μs
Phobos: 6 secs, 835 ms, and 319 μs

Using ldmd2 -O -release -noboundscheck -inline sort.d && ./sort  
instead:

2 inputs:
Faster: 0 hnsecs
Phobos: 33 μs and 5 hnsecs

...

65536 inputs:
Faster: 0 hnsecs (???)
Phobos: 7 secs, 865 ms, 450 μs, and 6 hnsecs


The same thing works for byte[] and ubyte[] array using a few 
tricks (eg: using counting sort and then sorting only keys).

So for ubyte[]/byte[] (ldc):

2 inputs:
1 ms, 305 μs, and 4 hnsecs
31 μs and 4 hnsecs

...

256 inputs:
8 ms, 76 μs, and 3 hnsecs
8 ms, 807 μs, and 3 hnsecs

...

65536 inputs:
347 ms, 330 μs, and 9 hnsecs
14 secs, 214 ms, 66 μs, and 5 hnsecs

Anyway implementation for byte/ubyte should be improved as long 
as it probably won't work fine if less is a functor or delegate:

auto byteSort(alias less = "a<b", T)(in T[] data)
if (is(T == byte) || is(T == ubyte))
{
    import std.range     : iota;
    import std.algorithm : map;
    import std.array     : uninitializedArray, array;

    // It needs 256*size_t.sizeof bytes. Maybe not suitables for 
embedded platforms?
    size_t[256] buckets;

    foreach(ref x; data)
       buckets[cast(ubyte)x]++;

    // Sort 256 keys using less function at compile time (less 
should work at compile time)
    static immutable sortedKeys = iota(0, 256).map!(x => 
cast(T)x).array.sort!less.array;

    auto result = uninitializedArray!(T[])(data.length);
    size_t currentIdx = 0;

    // bucket[0] = 3, bucket[1] = 0, bucket[2] = 1 => 0,0,0,2
    foreach(ref k; sortedKeys)
    {
       auto b = buckets[cast(ubyte)k];

       if (b)
       {
          result[currentIdx .. currentIdx + b] = cast(T)k;
          currentIdx += b;
          if (currentIdx >= data.length) return result;
       }
    }

    assert(0);
}


Maybe those implementations could be useful to improve phobos 
sorting?

Andrea



More information about the Digitalmars-d mailing list