Should Phobos use Timsort instead? (with a small tweak)

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Mon Nov 7 19:44:22 PST 2011


On 11/7/11 9:28 PM, Xinok wrote:
> On 11/7/2011 7:47 PM, dsimcha wrote:
>> Phobos definitely needs a decent stable sort. I had been meaning to
>> contribute a very highly tuned merge sort that I use for some
>> statistical calculations after allocators get into Phobos. Timsort would
>> make a good additional option. However, I **strongly** recommend against
>> **replacing** a sort that does not require O(n) extra space with one
>> that does.
>>
>
> Take a look at the graphic here:
> http://cl.ly/3L1o111u1M232q061o2N
>
> That's the extra step that I propose adding to Timsort, if it were
> implemented in Phobos. By doing a single range swap, you can reduce the
> space requirement from O(n/2) to O(n/4). My algorithm utilizes it so
> that only O(n/1024) additional space is required.
>
> Timsort would still use up to O(n/2) space, but if it failed to allocate
> enough memory, it could resort to the range swap instead.
>
> I'll leave it up to the community to decide what's best. But if the
> stable sort proves to be faster, has a worst case of O(n log n), and can
> succeed despite failing to allocate additional space, what purpose is
> there in keeping the unstable sort?

Would be great to provide a better implementation of both stable and 
unstable sort. The part about requiring extra memory may be problematic, 
but we may use e.g. an additional policy parameter to decide on that.

One thing I find confusing about the "range swap" operation that you 
rely on and mention in http://cl.ly/3L1o111u1M232q061o2N (and in the 
description of xinok sort which I skimmed without being able to 
understand) is that the example does not reflect the generality alluded 
to in the text. For example, the circled ranges at the top satisfy three 
conditions:

1. They are adjacent

2. Each is sorted

3. Each element in the left range is greater than any element in the 
right range

4. They have the same size (four elements each)

It is unclear how many of these conditions must be _actually_ satisfied 
for range swap to work. Are all needed? The use of "half" and "merged" 
is also quite casual. I wish a clearer description were available.

At any rate, if a provably better algorithm than what's in Phobos is 
available, we should include it in Phobos. I'm glad people find the 
necessity of a faster stable sort algorithm - it reflects maturity and 
sophistication in the community.


Andrei


More information about the Digitalmars-d mailing list