Faster sort?

Thu Apr 7 01:48:41 PDT 2016

On Thursday, 7 April 2016 at 07:57:11 UTC, Andrea Fontana wrote:
> I don't think compiler can remove a random generated array...
> I think times are too small to be measured with a good 
> accuracy, using start() stop() to resume stopwatch seems to add 
> a (fixed) overhead (some microsecs over the 10000 cycles) that 
> doesn't depend on size of array tested.

`boolSort(arr)` has no side effects, and in your benchmark the 
return value is discarded. This means that the compiler is free 
to simply skip that function call entirely.

Because of this, with warnings-as-errors your code won't even 
compile: "Warning: calling app.boolSort!"a<b".boolSort without 
side effects discards return value of type Result, prepend a 
cast(void) if intentional"

> Anyway, it doesn't matter if it is 0 hnsec or some microsecs, 
> in my opinion. It's still faster and faster than original sort 
> (of course it's n vs n*log(n)).

A time of "zero" means that your benchmark is broken, and tells 
you nothing about how fast your code actually is.

Here's a less broken benchmark (it's still got lots of room for 
improvement):

module app;

import std.algorithm, std.conv, std.range, std.random, std.stdio;

auto boolSort(alias less = "a<b")(in bool[] data)
{
    import std.array        : uninitializedArray;
    import std.functional   : binaryFun;

    // Check if order is true/false or false/true
    immutable test = binaryFun!less(true, false);
    size_t count;

    foreach(ref x; data)
       if (x == test) count++;

    return chain(repeat(test,count), repeat(!test, data.length - 
count));
}

void main()
{
     import std.datetime : Duration, StopWatch;

     Duration boolSortDur, stdSortDur;
     ulong boolSortKS, stdSortKS;

     foreach(_; 0..10_000)
     {
         auto arr =
             generate!( () => uniform(0,2))
             .map!(x => cast(bool)x)
             .take(65_536)
             .array;

         {
             StopWatch sw;
             sw.start;
             auto sorted = boolSort(arr);
             sw.stop;
             boolSortDur += sw.peek.to!Duration;

             /* Scan the sorted result so that the compiler
                doesn't elide the boolSort(arr) call: */
             foreach(bool b; sorted)
                 boolSortKS += b;
         }

         {
             StopWatch sw;
             sw.start;
             auto sorted = sort(arr);
             sw.stop;
             stdSortDur += sw.peek.to!Duration;

             /* Scan the sorted result so that the compiler
                doesn't elide the sort(arr) call: */
             foreach(bool b; sorted)
                 stdSortKS += b;
         }
     }

     /* Make some I/O conditional upon the sorted results
        to establish a direct causal chain between the
        code being benchmarked, and something that we
        know cannot ever be removed by the optimizer: */
     if(boolSortKS != stdSortKS)
         writeln("Keep sum mismatch!");

     writefln("boolSort(): %s", boolSortDur);
     writefln("std sort(): %s", stdSortDur);
}

Results on my 3.2GHz Haswell 64-bit Linux system:

DMD 64-bit:
     boolSort(): 2 secs, 886 ms, 915 μs, and 5 hnsecs
     std sort(): 15 secs, 135 ms, 900 μs, and 8 hnsecs

DMD 32-bit:
     boolSort(): 2 secs, 827 ms, 13 μs, and 6 hnsecs
     std sort(): 16 secs, 668 ms, 900 μs, and 2 hnsecs

LDC 64-bit:
     boolSort(): 369 ms, 590 μs, and 6 hnsecs
     std sort(): 13 secs, 798 ms, 107 μs, and 8 hnsecs

So, your boolSort() is faster as expected - but not infinitely so.

Andrei gave a lecture on optimization a while back:
     http://forum.dlang.org/thread/n6odlg$j2n$1@digitalmars.com
Among the key points he made was basically, "Never assume: 
measure. And, measuring correctly is harder than it looks."