Regarding implementing a stable sort for Phobos

Tue Mar 13 07:31:59 PDT 2012

On 3/13/12 1:31 AM, Xinok wrote:
> I've been playing with sorting algorithms a lot in recent months, so I
> want to implement a *working* stable sort for Phobos which is broken at
> the moment. I have a working library and I'm still adding to it. It's
> much more complex than a simple merge sort, being over 300 lines of code
> at the moment.

Working is better than broken.

> - It's a natural merge sort, which is faster on partially sorted lists,
> and adds little overhead for mostly random lists.
> - It uses O(log n log n) additional space for merging.

That's 1024 when n is 4 billion. I think you can safely approximate it 
with alloca or a fixed-size stack-allocated array.

> - I wrote it to sort random-access ranges *without* slicing, but I think
> the exclusion of slicing makes it slower. I'm writing a separate
> implementation which uses slicing and I'll keep it if it's much faster.

Having random access implies having slicing.

> - To avoid multiple allocations, the user can allocate their own
> temporary memory and pass it to the sort function.

If you need different allocation strategies, I suggest you make it a 
policy (like stability is).

> - I decided against using pointers. While I could use them to write
> highly optimized code for arrays, pointers can't be used in safe code
> and don't work very well in CTFE at the moment.

Perhaps it's best to have two distinct implementations guarded by if 
(__ctfe). The runtime implementation can be @trusted.

> Is it worth keeping the implementation *without* slicing? Many functions
> in Phobos do require slicing, including the unstable sort, and I think
> most random-access ranges do or could support slicing.

No need.

> What would you prefer in terms of memory usage vs performance? O(n/2)
> space is optimal for performance, O(1) (in-place) requires zero
> allocations but is slower, and O(log n log n) provides a good balance.

The latter rounded up to a constant sounds best.

> Should I implement concurrency? Merge sort is very easy to parallelize,
> and being in-place or natural doesn't change this fact.

Let's save that for std.parallel_algorithm.

> Should we take a different path and go for a basic merge sort or even
> Timsort? I've considered writing a Timsort though that seems like
> daunting task to me, so here's an implementation written in C -
> https://github.com/swenson/sort

I don't know how your function's performance profile compares with 
timsort's.

Andrei