Worst-case performance of quickSort / getPivot

Chris Cain clcain at uncg.edu
Sat Nov 16 17:29:53 PST 2013


On Sunday, 17 November 2013 at 00:56:46 UTC, Andrei Alexandrescu 
wrote:
> Very simple. Any sequence with a large number of equivalent 
> elements will cause timsort to work more than an unstable 
> algorithm, because it would need to shift around those 
> elements. (Consider e.g. 1, 0, 0, 0, ..., 0.) The discussion 
> then shifts to how many equivalent elements are in the typical 
> range etc.

Ah, very nicely described. So a stable sort can take more time 
when there is more than one equal element than a theoretical 
unstable sort.

> There are well known tweaks to quicksort that make it operate 
> well on sorted or almost sorted sequences.

I'd have to see them to comment on them. I'd doubt that they only 
use quicksort if they support taking advantage of large parts of 
the array being sorted. I'd expect some sort of merging to occur 
in that case.

> We could do that, but that would be masking a bug instead of 
> fixing it. We should instead focus on fixing unstable sort. One 
> additional issue with making stable sort the default for a 
> while is that people will grow to assume sort() is stable, and 
> will have broken code when we switch back.
>
> Let's fix unstable sort. If it's slower on any other data than 
> Timsort's happy case, we have a bug.

Sure, but could we at least agree on a timescale? If we can't 
come up with something significantly better than Timsort in the 
next year, it seems pointless to hold out for something that may 
never come. Hidden bug or not, deliberately exposing a bug that 
has no effective plan to be fixed isn't a good tactic either. 
Especially if D is expected to be competitive to other native 
languages in speed. Seeing D falter in the "sort" column of a 
shootout could be disconcerting to some and we can't count on 
everyone knowing to specify stable sort in those head-to-heads 
until we work out improving unstable sort.

That said, your point that people might grow comfortable with 
sort being stable is a good one. Unfortunately, I don't have an 
answer for that. Documentation detailing that sort is not 
guaranteed a particular stability is about the best that could be 
done. Perhaps a variant of Timsort could be prepared that 
deliberately changes the order of equal elements. Designed 
instability could suffice as a stopgap against that (at least 
until the unstable sort is worked out).


More information about the Digitalmars-d mailing list