Worst-case performance of quickSort / getPivot

Ivan Kazmenko gassa at mail.ru
Sat Nov 16 17:07:16 PST 2013


On Sunday, 17 November 2013 at 00:18:24 UTC, Chris Cain wrote:
> I think it's more complicated than that. Let's assume for a 
> moment that you've proven that such an unstable sort must exist 
> that is faster (I'm not convinced that it necessarily must take 
> extra work to maintain stability). You have not shown how much 
> faster it might be (it could be only 1% faster) nor how much 
> work it would take to discover (even an ideal pivot choice for 
> quicksort actually cannot be as fast as Timsort on an already 
> sorted sequence, and quicksort is not an appropriate algorithm 
> for accepting presorted subsequences). If it takes 2 years to 
> come up with an unstable sort faster than Timsort, then it 
> seems like a long time to be using something that isn't the 
> best that we currently have. Especially if D is to be used in 
> benchmarks against other languages.

Regarding an ideal pivot choice for quicksort, I'd like to 
emphasize that it is simply non-existent.  Here's why.

Let us measure quicksort performance as the number of comparisons 
it makes.

Let us make some assumptions:
quicksort goes as
--(1) choose a pivot p in O(1) comparisons;
--(2) partition the segment into two;
--(3) run quicksort recursively on the two resulting segments.

Now, (2) is effectively done by comparing every element with p, 
thus in Theta(n) (on the order of n) comparisons where n is the 
length of the current segment.

Under these assumptions, we can construct the killer array for an 
arbitrary quicksort of such kind, even if we don't know how 
exactly the pivot is chosen (say, closed source or obfuscation), 
but we have the following interface to it:

Instead of accessing the array a[], quicksort calls the function 
less(i,j) which tells it whether a[i]<a[j], and we control that 
function.

Now, we start with all values in array a[] undefined.  With each 
a[i], we associate a counter c[i]: how many times it took part in 
a comparison.  As we can see from the above, during a single call 
to quicksort, in steps (1) and (2) the pivot will eventually get 
Theta(n) comparisons, while other elements will all get only O(1) 
comparisons each.

So here's what we'll do.

1. When a comparison asks to relate two undefined values, we pick 
one of them with the larger c[i] and make it the lowest number 
possible.  That is, the first fixed value will be 0, the next 1, 
and so on.  (In the end, a[] will be a permutation of 0, 1, ..., 
n-1.)

2. When we are asked to relate a defined value to an undefined 
one, the defined value will always be lower (because, when we 
eventually define the undefined one, it will be higher than all 
the values defined before it).

3. When we are asked about two known values, just tell the truth.

This simple algorithm ensures that, on each call to quicksort, we 
quickly fix the pivot as one of the O(1) lowest values on the 
segment.  So, one of the next two segments will have length 
n-O(1), and the recursion gives us Theta(n) segments of linearly 
decreasing lengths, and thus Theta(n^2) total running time.

Now, the assumption of picking a pivot in O(1) comparisons covers 
a broad variety of pivot choices, including 
first/last/middle/random element, median of three or five, median 
of medians, or any combination of these.  The random pick fails 
in the following sense: if we seed the RNG, construct a killer 
case, and then start with the same seed, we get Theta(n^2) 
behavior reproduced.

Double-pivot quicksort can be forced to go quadratic in a similar 
fashion.

Now, introsort escapes this fate by switching to a 
guaranteed-n-log-n algorithm instead of (3) when it goes too 
deep.  This comes at a (small) cost of checking the current depth 
on every call to quicksort.

The above is just my retelling of a great short article "A Killer 
Adversary for Quicksort" by M. D. McIlroy here: 
http://www.cs.dartmouth.edu/~doug/mdmspe.pdf

Ivan Kazmenko.


More information about the Digitalmars-d mailing list