Should Phobos use Timsort instead? (with a small tweak)

Xinok xinok at live.com
Mon Nov 7 20:43:57 PST 2011


On 11/7/2011 10:44 PM, Andrei Alexandrescu wrote:
> On 11/7/11 9:28 PM, Xinok wrote:
>> On 11/7/2011 7:47 PM, dsimcha wrote:
>>> Phobos definitely needs a decent stable sort. I had been meaning to
>>> contribute a very highly tuned merge sort that I use for some
>>> statistical calculations after allocators get into Phobos. Timsort would
>>> make a good additional option. However, I **strongly** recommend against
>>> **replacing** a sort that does not require O(n) extra space with one
>>> that does.
>>>
>>
>> Take a look at the graphic here:
>> http://cl.ly/3L1o111u1M232q061o2N
>>
>> That's the extra step that I propose adding to Timsort, if it were
>> implemented in Phobos. By doing a single range swap, you can reduce the
>> space requirement from O(n/2) to O(n/4). My algorithm utilizes it so
>> that only O(n/1024) additional space is required.
>>
>> Timsort would still use up to O(n/2) space, but if it failed to allocate
>> enough memory, it could resort to the range swap instead.
>>
>> I'll leave it up to the community to decide what's best. But if the
>> stable sort proves to be faster, has a worst case of O(n log n), and can
>> succeed despite failing to allocate additional space, what purpose is
>> there in keeping the unstable sort?
>
> Would be great to provide a better implementation of both stable and
> unstable sort. The part about requiring extra memory may be problematic,
> but we may use e.g. an additional policy parameter to decide on that.
>
> One thing I find confusing about the "range swap" operation that you
> rely on and mention in http://cl.ly/3L1o111u1M232q061o2N (and in the
> description of xinok sort which I skimmed without being able to
> understand) is that the example does not reflect the generality alluded
> to in the text. For example, the circled ranges at the top satisfy three
> conditions:
>
> 1. They are adjacent
>
> 2. Each is sorted
>
> 3. Each element in the left range is greater than any element in the
> right range
>
> 4. They have the same size (four elements each)
>
> It is unclear how many of these conditions must be _actually_ satisfied
> for range swap to work. Are all needed? The use of "half" and "merged"
> is also quite casual. I wish a clearer description were available.
>
> At any rate, if a provably better algorithm than what's in Phobos is
> available, we should include it in Phobos. I'm glad people find the
> necessity of a faster stable sort algorithm - it reflects maturity and
> sophistication in the community.
>
>
> Andrei

1. Yes. As mentioned in the graphic, the key is to move the smallest 
values to the left half, and the largest values to the right half. Since 
in a sorted list, the smallest values are at the beginning, and the 
largest values are at the end, the two ranges to swap will always be 
adjacent.

2. Yes. As you know, merge sort works by recursively merging two sorted 
lists into one sorted list. Merge sort generally requires O(n) space, or 
O(n/2) space if you only make a copy of the smaller half. The graphic 
intends to show how range swap reduces the space requirement. Instead of 
doing one large merge, you do two smaller merges.

3. Yes, see #1.

4. Yes. The "range swap" literally swaps two ranges of elements, so they 
must be equal in length.


If you're still confused, think of it like this:

In quick sort, you choose a pivot value, then you move all smaller 
values to the left of the pivot, and all larger values to the right of 
the pivot. Once you do this, there's no need to move any value in the 
left half to the right half, and vice versa.

Similarly, range swap moves all the smallest values to the left half, 
and all the largest values to the right half, so there's no need to move 
values between the two halves.


More information about the Digitalmars-d mailing list