Timsort vs some others

Xinok xinok at live.com
Mon Dec 17 22:52:25 PST 2012


On Monday, 17 December 2012 at 15:28:36 UTC, bearophile wrote:
> Regarding the recent Phobos improvements that introduce a 
> Timsort:
>
> http://forum.dlang.org/thread/50c8a4e67f79_3fdd19b7ae814691e@sh3.rs.github.com.mail
>
> I have found a blog post that compares the performance of 
> Timsort, Smoothsort, and std::stable_sort:
>
> http://www.altdevblogaday.com/2012/06/15/smoothsort-vs-timsort/
>
> Bye,
> bearophile

While Smoothsort may be mathematically sound, it simply doesn't 
translate well to computer hardware. It's a variant of heap sort 
which requires a great deal of random access, whereas Timsort is 
a variant of merge sort which is largely sequential and benefits 
from the CPU cache. Furthermore, the Leonardo heap is much more 
computationally expensive than a typical binary or ternary heap.

Both Timsort and Smoothsort are what you call "natural sorts," 
meaning they typically require fewer comparisons on data with low 
entropy. They're also rather complex which means added overhead. 
When sorting primitive types like int, comparisons are 
inexpensive, so the overhead makes these algorithms slower. But 
had he tested it on a data type like strings, then we'd likely 
see Timsort take the lead.

On purely random data, quick sort and merge sort will win most of 
the time. But Timsort has an advantage over Smoothsort of being 
an adaptive algorithm; the so called "galloping mode," which is 
computationally expensive, is only activated when minGallop 
reaches a certain threshold and therefore beneficial. Otherwise, 
a simple linear merge is used (i.e. merge sort).

On another note, I highly doubt that std::sort uses a "median of 
medians" algorithm, which would add much overhead and essentially 
double the number of comparisons required with little to no 
benefit. More likely, it simply chooses the pivot from a median 
of three.


More information about the Digitalmars-d mailing list