topN using a heap
Andrei Alexandrescu via Digitalmars-d
digitalmars-d at puremagic.com
Sat Sep 24 11:22:51 PDT 2016
On 09/23/2016 04:45 PM, Jon Degenhardt wrote:
> That's a very nice result. Both the absolute numbers and that all three
> sets achieve similar performance, as they different distribution
> characteristics.
Got curious so I tried to patch my ldc installation (0.17.1 (DMD
v2.068.2, LLVM 3.8.0)), but sadly that didn't work out because sorting.d
uses the new syntax in various places.
Probably the best baseline is the equivalent C++ code using the mature
implementation of nth_element, see
http://paste.ofcode.org/ieZPdchjzTXbDcG3LvsYBP (also pasted at the end
of this message). Here's what I got:
http://paste.ofcode.org/6feyBLRiJtieHdZ3bmGaUW, see also text below.
The topN code produced with dmd is faster in virtually all cases (has
parity for -f 4, which I suspect is a degenerate case with most elements
equal, which exercises only a small part of the algorithm). For C++ I
used -O3 -DNDEBUG, any other flags I should use?
As an aside, I enjoy poking fun at the stunning inefficiency of
iostreams in my public talks; they delivered again :o).
Andrei
=========================== shell
$ g++ -O3 -DNDEBUG -omedian_by_sort_cpp issue16517.cpp
$ g++ -O3 -DNDEBUG -DTOPN -omedian_by_topn_cpp issue16517.cpp
$ dmd -O -inline -release -ofmedian_by_sort -boundscheck=off
issue16517.d
$ dmd -O -inline -release -version=topn -ofmedian_by_topn
-boundscheck=off issue16517.d
$ cut -f 2 /tmp/googlebooks-eng-all-1gram-20120701-0 |
./median_by_sort_cpp
median (via sort): 1972; lines: 10512769; total time (ms): 3419.11; sort
time (ms): 314.086
$ cut -f 2 /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_sort
median (via sort): 1972; lines: 10512769; total time (ms): 1462; sort
time (ms): 399
$ cut -f 2 /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_topn_cpp
median (via nth_element): 1972; lines: 10512769; total time (ms):
3255.41; nth_element time (ms): 40.676
$ cut -f 2 /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_topn
median (via topN): 1972; lines: 10512769; total time (ms): 1211; topN
time (ms): 32
$ cut -f 3 /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_sort_cpp
median (via sort): 3; lines: 10512769; total time (ms): 2949.11; sort
time (ms): 297.814
$ cut -f 3 /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_sort
median (via sort): 3; lines: 10512769; total time (ms): 1351; sort time
(ms): 382
$ cut -f 3 /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_topn_cpp
median (via nth_element): 3; lines: 10512769; total time (ms): 2316.75;
nth_element time (ms): 65.041
$ cut -f 3 /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_topn
median (via topN): 3; lines: 10512769; total time (ms): 973; topN time
(ms): 59
$ cut -f 4 /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_sort_cpp
median (via sort): 2; lines: 10512769; total time (ms): 2614.47; sort
time (ms): 351.285
$ cut -f 4 /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_sort
median (via sort): 2; lines: 10512769; total time (ms): 1269; sort time
(ms): 361
$ cut -f 4 /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_topn_cpp
median (via nth_element): 2; lines: 10512769; total time (ms): 2443.17;
nth_element time (ms): 76.211
$ cut -f 4 /tmp/googlebooks-eng-all-1gram-20120701-0 | ./median_by_topn
median (via topN): 2; lines: 10512769; total time (ms): 998; topN time
(ms): 77
===========================
=========================== issue16517.cpp
#include <iostream>
#include <cstdio>
#include <ctime>
#include <vector>
#include <algorithm>
using namespace std;
int main()
{
clock_t swTotal, swSort;
swTotal = clock();
vector<double> values;
double num;
while (cin >> num) {
values.push_back(num);
}
size_t medianIndex = values.size() / 2;
swSort = clock();
#ifdef TOPN
const char* method = "nth_element";
nth_element(values.begin(), values.begin() + medianIndex,
values.end());
#else
const char* method = "sort";
sort(values.begin(), values.end());
#endif
clock_t done = clock();
printf("median (via %s): %g; lines: %zu; total time (ms): %g; %s
time (ms): %g\n",
method, values[medianIndex], values.size(),
(done - swTotal) * 1000. / CLOCKS_PER_SEC,
method, (done - swSort) * 1000. / CLOCKS_PER_SEC);
}
===========================
=========================== issue16517.d
import std.stdio;
import std.array : appender;
import std.algorithm : sort, topN;
import std.conv;
import std.range;
import std.datetime: StopWatch;
void main()
{
StopWatch swTotal, swSort;
swTotal.start;
Appender!(double[]) values;
foreach (line; stdin.byLine)
values.put(line.to!double);
size_t medianIndex = values.data.length/2;
swSort.start;
version(topn)
{
topN(values.data, medianIndex);
auto method = "topN";
}
else {
sort(values.data);
auto method = "sort";
}
swTotal.stop;
swSort.stop;
writefln("median (via %s): %g; lines: %d; total time (ms): %d; %s
time (ms): %d",
method, values.data[medianIndex], values.data.length,
swTotal.peek.msecs,
method, swSort.peek.msecs);
}
===========================
More information about the Digitalmars-d
mailing list