Worst-case performance of quickSort / getPivot
Chris Cain
clcain at uncg.edu
Sat Nov 16 21:21:23 PST 2013
On Sunday, 17 November 2013 at 04:14:22 UTC, Andrei Alexandrescu
wrote:
> Probably a good step forward would be to hook a sort
> benchmarking corpus to our unittests. What are the consecrated
> corpora?
I'll kick off the suggestions with this:
File to provide "real" data:
ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_K_12_substr__MG1655_uid57779/NC_000913.fna
From http://www.genome.jp/kegg-bin/show_organism?org=eco ->
RefSeq -> NC_000913.fna
(despite this being real data, typically you don't actually sort
genomes like this ... it's simply "more interesting" than a
pseudo-randomized dataset)
Code:
---
import std.range, std.algorithm, std.stdio, std.array, std.conv,
std.datetime;
enum strategy = SwapStrategy.stable;
void main() {
auto arr = File("NC_000913.fna").byLine
.drop(1) // Trim the first line because it's just
metadata
/*.take(50)*/
.join
.chunks(50)
.map!text
.array;
StopWatch sw = StopWatch(AutoStart.yes);
arr.sort!("a < b", strategy)();
sw.stop();
writeln(text(arr.length, " elements sorted in ",
sw.peek.msecs, " msecs"));
}
---
On my system, this shows unstable sort to be ~2x faster than
stable sort (which is to be expected of data with no discernible
patterns).
The "key" thing this is testing is the effectiveness of the sort
on data that often takes longer to compare (with such a small
alphabet, many strings have a common prefix).
That said, it might also be reproduced "well enough" using a
random generator to create similar strings to sort, but the basic
idea is there. I just like using real genomes for performance
testing things :)
More information about the Digitalmars-d
mailing list