Worst-case performance of quickSort / getPivot
Chris Cain
clcain at uncg.edu
Sat Nov 16 17:29:53 PST 2013
On Sunday, 17 November 2013 at 00:56:46 UTC, Andrei Alexandrescu
wrote:
> Very simple. Any sequence with a large number of equivalent
> elements will cause timsort to work more than an unstable
> algorithm, because it would need to shift around those
> elements. (Consider e.g. 1, 0, 0, 0, ..., 0.) The discussion
> then shifts to how many equivalent elements are in the typical
> range etc.
Ah, very nicely described. So a stable sort can take more time
when there is more than one equal element than a theoretical
unstable sort.
> There are well known tweaks to quicksort that make it operate
> well on sorted or almost sorted sequences.
I'd have to see them to comment on them. I'd doubt that they only
use quicksort if they support taking advantage of large parts of
the array being sorted. I'd expect some sort of merging to occur
in that case.
> We could do that, but that would be masking a bug instead of
> fixing it. We should instead focus on fixing unstable sort. One
> additional issue with making stable sort the default for a
> while is that people will grow to assume sort() is stable, and
> will have broken code when we switch back.
>
> Let's fix unstable sort. If it's slower on any other data than
> Timsort's happy case, we have a bug.
Sure, but could we at least agree on a timescale? If we can't
come up with something significantly better than Timsort in the
next year, it seems pointless to hold out for something that may
never come. Hidden bug or not, deliberately exposing a bug that
has no effective plan to be fixed isn't a good tactic either.
Especially if D is expected to be competitive to other native
languages in speed. Seeing D falter in the "sort" column of a
shootout could be disconcerting to some and we can't count on
everyone knowing to specify stable sort in those head-to-heads
until we work out improving unstable sort.
That said, your point that people might grow comfortable with
sort being stable is a good one. Unfortunately, I don't have an
answer for that. Documentation detailing that sort is not
guaranteed a particular stability is about the best that could be
done. Perhaps a variant of Timsort could be prepared that
deliberately changes the order of equal elements. Designed
instability could suffice as a stopgap against that (at least
until the unstable sort is worked out).
More information about the Digitalmars-d
mailing list