Quadratic time to sort in a simple case?

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Mon Apr 22 14:34:54 PDT 2013


On 4/19/13 6:37 PM, Ivan Kazmenko wrote:
>> That [SwapStrategystable] uses the TimSort that contains
>> the optimization you look for.
>>
>> alias mySort = sort!("a < b", SwapStrategy.stable);
>
> Thank you, these are good for now.
>
> A seemingly easy way to provide the choice globally would be to have an
> assignable SwapStrategy.default in std.algorithm... or would that be a
> bad design decision to have such a global state?

Yah, that would mean one can't reason what a piece of code does without 
knowing what the context was.

>>> Consider a sorted array. Append an element that is less than all the
>>> previous elements. Then sort the array again by the sort function
>>> from std.algorithm.
>>
>> If you know that, then don't do a sort. Make some free space in
>> the array, shift the items, sort just the first part with a slice.
>
> That behavior isn't always easy to predict.
>
> Still, I think this is a problem in Phobos which should be fixed. In
> most implementations of quicksort (even middle-element, the easiest to
> come up with), one expects it to perform in O(n log n) with probability
> close to one, except on some artificially constructed cases.
>
> On a side note, I'm surprised that median-of-three (getPivot in
> std.algorithm) fails at such a simple test, so I had something new to
> learn there, too. If I just comment out the switch in getPivot and
> "return mid" directly, it becomes fast enough in this case.

I could have sworn I made getPivot choose at random at some point. I 
agree this should be fixed; there are a number of possibilities:

a) choose median of five

b) if the array is large, sort a stride of it first (e.g. 1%) then 
choose the pivot as the median point of that stride

c) add introspection to the algorithm, i.e. if an attempt to partition 
hits the worst case or near worst case, just try another pivot instead 
of moving forward with the sorting stage.

> Another view at this is the following. As far as I know by now, one of
> the points beyond D (and Phobos) design is to have everything safe by
> default, but faster if you want it explicitly and you know what you are
> doing.

That view of safety is different (memory safety).

> In this particular case, that would mean having a worst-case O(n
> log n) sort, and/or a stable sort, by default - but with the opportunity
> to switch to the (time-wise and stability-wise) unsafe quicksort if you
> really need the extra speedup and know what you are doing. So, why isn't
> TimSort the default?

TimSort is slower on average.


Andrei


More information about the Digitalmars-d-learn mailing list