Timsort vs some others
Xinok
xinok at live.com
Mon Dec 17 22:52:25 PST 2012
On Monday, 17 December 2012 at 15:28:36 UTC, bearophile wrote:
> Regarding the recent Phobos improvements that introduce a
> Timsort:
>
> http://forum.dlang.org/thread/50c8a4e67f79_3fdd19b7ae814691e@sh3.rs.github.com.mail
>
> I have found a blog post that compares the performance of
> Timsort, Smoothsort, and std::stable_sort:
>
> http://www.altdevblogaday.com/2012/06/15/smoothsort-vs-timsort/
>
> Bye,
> bearophile
While Smoothsort may be mathematically sound, it simply doesn't
translate well to computer hardware. It's a variant of heap sort
which requires a great deal of random access, whereas Timsort is
a variant of merge sort which is largely sequential and benefits
from the CPU cache. Furthermore, the Leonardo heap is much more
computationally expensive than a typical binary or ternary heap.
Both Timsort and Smoothsort are what you call "natural sorts,"
meaning they typically require fewer comparisons on data with low
entropy. They're also rather complex which means added overhead.
When sorting primitive types like int, comparisons are
inexpensive, so the overhead makes these algorithms slower. But
had he tested it on a data type like strings, then we'd likely
see Timsort take the lead.
On purely random data, quick sort and merge sort will win most of
the time. But Timsort has an advantage over Smoothsort of being
an adaptive algorithm; the so called "galloping mode," which is
computationally expensive, is only activated when minGallop
reaches a certain threshold and therefore beneficial. Otherwise,
a simple linear merge is used (i.e. merge sort).
On another note, I highly doubt that std::sort uses a "median of
medians" algorithm, which would add much overhead and essentially
double the number of comparisons required with little to no
benefit. More likely, it simply chooses the pivot from a median
of three.
More information about the Digitalmars-d
mailing list