Should Phobos use Timsort instead? (with a small tweak)
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Mon Nov 7 19:44:22 PST 2011
On 11/7/11 9:28 PM, Xinok wrote:
> On 11/7/2011 7:47 PM, dsimcha wrote:
>> Phobos definitely needs a decent stable sort. I had been meaning to
>> contribute a very highly tuned merge sort that I use for some
>> statistical calculations after allocators get into Phobos. Timsort would
>> make a good additional option. However, I **strongly** recommend against
>> **replacing** a sort that does not require O(n) extra space with one
>> that does.
>>
>
> Take a look at the graphic here:
> http://cl.ly/3L1o111u1M232q061o2N
>
> That's the extra step that I propose adding to Timsort, if it were
> implemented in Phobos. By doing a single range swap, you can reduce the
> space requirement from O(n/2) to O(n/4). My algorithm utilizes it so
> that only O(n/1024) additional space is required.
>
> Timsort would still use up to O(n/2) space, but if it failed to allocate
> enough memory, it could resort to the range swap instead.
>
> I'll leave it up to the community to decide what's best. But if the
> stable sort proves to be faster, has a worst case of O(n log n), and can
> succeed despite failing to allocate additional space, what purpose is
> there in keeping the unstable sort?
Would be great to provide a better implementation of both stable and
unstable sort. The part about requiring extra memory may be problematic,
but we may use e.g. an additional policy parameter to decide on that.
One thing I find confusing about the "range swap" operation that you
rely on and mention in http://cl.ly/3L1o111u1M232q061o2N (and in the
description of xinok sort which I skimmed without being able to
understand) is that the example does not reflect the generality alluded
to in the text. For example, the circled ranges at the top satisfy three
conditions:
1. They are adjacent
2. Each is sorted
3. Each element in the left range is greater than any element in the
right range
4. They have the same size (four elements each)
It is unclear how many of these conditions must be _actually_ satisfied
for range swap to work. Are all needed? The use of "half" and "merged"
is also quite casual. I wish a clearer description were available.
At any rate, if a provably better algorithm than what's in Phobos is
available, we should include it in Phobos. I'm glad people find the
necessity of a faster stable sort algorithm - it reflects maturity and
sophistication in the community.
Andrei
More information about the Digitalmars-d
mailing list