topN using a heap

Jon Degenhardt via Digitalmars-d digitalmars-d at puremagic.com
Sun Sep 25 01:19:04 PDT 2016


On Saturday, 24 September 2016 at 18:22:51 UTC, Andrei 
Alexandrescu wrote:
> Got curious so I tried to patch my ldc installation (0.17.1 
> (DMD v2.068.2, LLVM 3.8.0)), but sadly that didn't work out 
> because sorting.d uses the new syntax in various places.
>
> Probably the best baseline is the equivalent C++ code using the 
> mature implementation of nth_element, see 
> http://paste.ofcode.org/ieZPdchjzTXbDcG3LvsYBP (also pasted at 
> the end of this message). Here's what I got: 
> http://paste.ofcode.org/6feyBLRiJtieHdZ3bmGaUW, see also text 
> below.
>
> The topN code produced with dmd is faster in virtually all 
> cases (has parity for -f 4, which I suspect is a degenerate 
> case with most elements equal, which exercises only a small 
> part of the algorithm). For C++ I used -O3 -DNDEBUG, any other 
> flags I should use?
>
> [snip]
>
I added the topN pull request to a local LDC build (current LDC 
master). Also built the C++ nth_element and sort program you 
listed. (GCC 4.9.3, compiler line: g++-mp-4.9 -O3 -static-libgcc 
-static-libstdc++ -std=c++11).

Did 7 runs for each variant on the three fields in ngram file. 
Median times are below.

Median sort times (ms):

      Field 2   Field 3   Field 4
DMD      351       348       326
LDC      273       261       245
C++      281       260       246

Median topN / nth_element times (ms):

      Field 2   Field 3   Field 4
LDC       21        44        57
C++       41        60        71

Looking very good indeed!


More information about the Digitalmars-d mailing list