Worst-case performance of quickSort / getPivot

Vladimir Panteleev vladimir at thecybershadow.net
Fri Nov 15 13:46:23 PST 2013


I've been investigating an instance that I ran into while 
processing some data, in which multiSort had apparently stalled 
whereas two successive sort calls processed the data quickly. 
After reducing the code and 20 million points, I've reduced the 
problem to this test case:

enum N=1000000;
N.iota.retro.chain(N.only).array.sort();

Apparently, this simple case exhibits worst-case (quadratic) 
complexity in our sort implementation. The .retro isn't necessary 
- this also exhibits quadratic complexity (athough the total 
number of predicate calls is halved), because the first quickSort 
cycle will flip the order of elements during partitioning:

N.iota.chain(0.only).array.sort();

Closer inspection shows that the problem is manifested due to how 
the pivot is chosen. Currently, std.algorithm's getPivot uses the 
median-of-three method (median value of first, middle and last 
element[1]). However, in this case, the median element will 
always be the first element - this remains true in all quickSort 
iterations.

Here is an illustrated example with N=9. Note that at the same 
time as choosing the pivot, getPivot also orders the candidates 
(first, middle and last range elements) according to the sort 
predicate. After calling getPivot, quickSortImpl moves the pivot 
to the end of the range by swapping it with the last element.

getPivot(0..10)
8,7,6,5,4,3,2,1,0,9 <- getPivot - before swap
9,7,6,5,4,8,2,1,0,3 <- getPivot - after swap
9,7,6,5,4,3,2,1,0,8 <- quickSortImpl - after swap
9,8,6,5,4,3,2,1,0,7 <- quickSortImpl - after partition
getPivot(2..10)
6,5,4,3,2,1,0,7 <- getPivot - before swap
7,5,4,3,6,1,0,2 <- getPivot - after swap
7,5,4,3,2,1,0,6 <- quickSortImpl - after swap
7,6,4,3,2,1,0,5 <- quickSortImpl - after partition
(...)

The algorithm gets stuck on "slopes" in sorted data, which are 
not uncommon in real-world data (my data resembled a 1D 
heightmap). Although the median-of-three way of picking a pivot 
is recommended by some sources (notably Sedgewick), perhaps a 
better method would be better suitable for Phobos:

* Many sources recommend using a random element as a pivot. 
According to [2], "Randomized quicksort, for any input, it 
requires only O(n log n) expected time (averaged over all choices 
of pivots)". Also, if it is not possible to predict the pivot 
choice, it is impossible to craft worst-case input, which is a 
plus from a security point[3]. However, I'm not sure if making 
the behavior of std.algorithm's sort nondeterministic is 
desirable.
* It looks like many STL implementations use Introsort[4] - a 
hybrid sorting algorithm which starts with QuickSort, but 
switches to HeapSort if the recursion limit exceeds a threshold.
* I found a recent article[5] which describes an optimal 
algorithm of picking a pivot, however it is behind a paywall.

[1]: Apparently the term is also used to refer to choosing three 
points *randomly*, instead of first/middle/last: 
http://stackoverflow.com/a/164183/21501
[2]: 
http://en.wikipedia.org/wiki/Quicksort#Analysis_of_Randomized_quicksort
[3]: 
https://www.google.com/search?q=quicksort+worst+case+denial+of+service
[4]: http://en.wikipedia.org/wiki/Introsort
[5]: https://news.ycombinator.com/item?id=6629117


More information about the Digitalmars-d mailing list