Quadratic time to sort in a simple case?

Ivan Kazmenko gassa at mail.ru
Tue Apr 23 14:09:40 PDT 2013


On Monday, 22 April 2013 at 21:34:47 UTC, Andrei Alexandrescu 
wrote:

> a) choose median of five

Sounds like it could decrease speed on average, and still perhaps 
fail on a simple case like [0, 1, 2, 3, ..., 999, 0, 0] or 
something?  Didn't test it though.

> b) if the array is large, sort a stride of it first (e.g. 1%) 
> then choose the pivot as the median point of that stride

A bad case would be, e.g., the array containing almost sorted 
parts, and the stride getting taken from the part where lowest or 
highest elements reside.

> c) add introspection to the algorithm, i.e. if an attempt to 
> partition hits the worst case or near worst case, just try 
> another pivot instead of moving forward with the sorting stage.

Now, trying another pivot again and again could give the same 
O(n^2) in a bad case.

> One simple solution to this (which I forgot to mention) is
> picking a pivot at random.

But it would again mean that the sort function depends on global 
state (RNG state).  And if it doesn't (a local RNG is initialized 
somehow each time), an adversary would just have to hardcode it 
once to get the same O(n^2) worst case anytime he wants.

And on Tuesday, 23 April 2013 at 01:10:26 UTC, Xinok wrote:
> I filed a bug report for this issue a year ago:
> http://d.puremagic.com/issues/show_bug.cgi?id=7767
>
> I've been meaning to fix this issue myself. Time allowing, I'll 
> do it soon.

What I wonder now, what would be the goal of such a fix?

One possible goal would be to have an O(n log n) worst case sort. 
  And that would perhaps cost some speed and/or memory on average. 
  Besides, such a sort function is already implemented (TimSort), 
so it's just a matter of setting the default then.

Another goal would be to have O(n^2) only for cases which don't 
have a reasonable chance to occur in real world, but could still 
be constructed artificially by an adversary.  And that could 
perhaps be done by just changing the getPivot implementation.  
Other languages' experience may be useful here:

1. GNU C++ 4.x std::sort implementation is IntroSort [1].
2. Microsoft .NET used Quicksort up to version 4.0 [2].
    Now in .NET 4.5 they use Introsort too [3].
3. For primitive types, Java 6 used a tuned QuickSort [4].
    The current Java 7 uses Dual-Pivot QuickSort [5].
    Java uses TimSort for Objects [6].
4. Python uses TimSort since version 2.3 [7].

In any case, there are techniques to construct a bad case given a 
QuickSort implementation.  One of them is described by M. Douglas 
McIlroy in "A Killer Adversary for Quicksort" [8].  We run 
QuickSort on a would-be-array "a" controlling the "a[i] < a[j]" 
predicate, and, using only certain assumptions (not looking at 
the code!), we build a killer-case array on the fly.  Given some 
thought, the technique could perhaps be extended to Dual-Pivot 
QuickSort too.  As long as some flavor of QuickSort (and topN) is 
in Phobos, such a unittest would fit the sort implementation 
nicely, if only to show just how "bad" it can get.

Ivan Kazmenko.

-----

References:
[1] 
http://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-html-USERS-4.4/a01027.html#g152148508b4a39e15ffbfbc987ab653a
[2] 
http://msdn.microsoft.com/en-us/library/6tf1f0bc%28v=vs.100%29.aspx
[3] 
http://msdn.microsoft.com/en-us/library/6tf1f0bc%28v=vs.110%29.aspx
[4] 
http://docs.oracle.com/javase/6/docs/api/java/util/Arrays.html#sort%28int[]%29
[5] 
http://docs.oracle.com/javase/7/docs/api/java/util/Arrays.html#sort%28int[]%29
[6] 
http://docs.oracle.com/javase/7/docs/api/java/util/Arrays.html#sort%28java.lang.Object[]%29
[7] http://en.wikipedia.org/wiki/Timsort
[8] http://www.cs.dartmouth.edu/~doug/mdmspe.pdf


More information about the Digitalmars-d-learn mailing list