Quadratic time to sort in a simple case?
Ivan Kazmenko
gassa at mail.ru
Tue Apr 23 14:09:40 PDT 2013
On Monday, 22 April 2013 at 21:34:47 UTC, Andrei Alexandrescu
wrote:
> a) choose median of five
Sounds like it could decrease speed on average, and still perhaps
fail on a simple case like [0, 1, 2, 3, ..., 999, 0, 0] or
something? Didn't test it though.
> b) if the array is large, sort a stride of it first (e.g. 1%)
> then choose the pivot as the median point of that stride
A bad case would be, e.g., the array containing almost sorted
parts, and the stride getting taken from the part where lowest or
highest elements reside.
> c) add introspection to the algorithm, i.e. if an attempt to
> partition hits the worst case or near worst case, just try
> another pivot instead of moving forward with the sorting stage.
Now, trying another pivot again and again could give the same
O(n^2) in a bad case.
> One simple solution to this (which I forgot to mention) is
> picking a pivot at random.
But it would again mean that the sort function depends on global
state (RNG state). And if it doesn't (a local RNG is initialized
somehow each time), an adversary would just have to hardcode it
once to get the same O(n^2) worst case anytime he wants.
And on Tuesday, 23 April 2013 at 01:10:26 UTC, Xinok wrote:
> I filed a bug report for this issue a year ago:
> http://d.puremagic.com/issues/show_bug.cgi?id=7767
>
> I've been meaning to fix this issue myself. Time allowing, I'll
> do it soon.
What I wonder now, what would be the goal of such a fix?
One possible goal would be to have an O(n log n) worst case sort.
And that would perhaps cost some speed and/or memory on average.
Besides, such a sort function is already implemented (TimSort),
so it's just a matter of setting the default then.
Another goal would be to have O(n^2) only for cases which don't
have a reasonable chance to occur in real world, but could still
be constructed artificially by an adversary. And that could
perhaps be done by just changing the getPivot implementation.
Other languages' experience may be useful here:
1. GNU C++ 4.x std::sort implementation is IntroSort [1].
2. Microsoft .NET used Quicksort up to version 4.0 [2].
Now in .NET 4.5 they use Introsort too [3].
3. For primitive types, Java 6 used a tuned QuickSort [4].
The current Java 7 uses Dual-Pivot QuickSort [5].
Java uses TimSort for Objects [6].
4. Python uses TimSort since version 2.3 [7].
In any case, there are techniques to construct a bad case given a
QuickSort implementation. One of them is described by M. Douglas
McIlroy in "A Killer Adversary for Quicksort" [8]. We run
QuickSort on a would-be-array "a" controlling the "a[i] < a[j]"
predicate, and, using only certain assumptions (not looking at
the code!), we build a killer-case array on the fly. Given some
thought, the technique could perhaps be extended to Dual-Pivot
QuickSort too. As long as some flavor of QuickSort (and topN) is
in Phobos, such a unittest would fit the sort implementation
nicely, if only to show just how "bad" it can get.
Ivan Kazmenko.
-----
References:
[1]
http://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-html-USERS-4.4/a01027.html#g152148508b4a39e15ffbfbc987ab653a
[2]
http://msdn.microsoft.com/en-us/library/6tf1f0bc%28v=vs.100%29.aspx
[3]
http://msdn.microsoft.com/en-us/library/6tf1f0bc%28v=vs.110%29.aspx
[4]
http://docs.oracle.com/javase/6/docs/api/java/util/Arrays.html#sort%28int[]%29
[5]
http://docs.oracle.com/javase/7/docs/api/java/util/Arrays.html#sort%28int[]%29
[6]
http://docs.oracle.com/javase/7/docs/api/java/util/Arrays.html#sort%28java.lang.Object[]%29
[7] http://en.wikipedia.org/wiki/Timsort
[8] http://www.cs.dartmouth.edu/~doug/mdmspe.pdf
More information about the Digitalmars-d-learn
mailing list