Randomness in built-in .sort

Sun Jan 4 19:23:06 PST 2009

On Mon, Jan 5, 2009 at 12:05 PM, dsimcha <dsimcha at yahoo.com> wrote:
> == Quote from Bill Baxter (wbaxter at gmail.com)'s article
>> It seems to me that the built-in .sort uses a randomized algorithm
>> that leads to results that differ from run-to-run when there are
>> elements with identical keys.
>> This seems like a bad idea to me for the built-in algorithm to have
>> non-deterministic behavior because it means you can have bugs that
>> aren't consistently repeatable.
>> I think the built-in sort should be some kind of stable sort.   Also
>> the stability or lack thereof is not mentioned in the spec, and it
>> probably should be because stability is critical for some uses of
>> sorting.  One shouldn't have to guess whether a given implementation
>> of D will use a stable sort or not.
>> --bb
>
> IDK why sorting is builtin anyhow.  I did some benchmarks a while back, and the
> builtin sort is slower than std.algorithm.  IIRC the builtin sort uses function
> pointers for comparison, while the std.algorithm sort uses template parameters
> that can be inlined at compile time.  I know this was mentioned before, but it
> didn't really get fully discussed.  Why is sorting builtin?  With property syntax,
> there's not even any syntactic sugar advantage.  Furthermore, the builtin sort is
> substantially less flexible than a library sort.

I agree.  Builtin sort should probably just go away in D2.
But it's there in D1 and can't be removed now.  The best we can hope
for there is for the specification to be tightened up to specify that
.sort must be a stable algorithm.

> On another note, it would be nice if std.algorithm implemented a stable O(N log N)
> sort, like a merge sort.  Right now, IIRC it uses an interesting stable swapping
> algorithm on top of a quick sort for O(N log^2 N) performance.  This is good when
> space is tight, but I think in most cases a merge sort is better as a stable sort.
>
> On the other hand, if builtin sort is kept, then I agree with you:  It should
> follow the D convention of doing the safe thing by default (stable sort with no
> O(N^2) corner cases) and allowing less safe behavior (unstable but possibly faster
> sort that may or may not have some O(N^2) corner cases) in libraries.

For my purposes, I just found Oskar Linde's stable sort implementation
(based on merge sort), and am using that now.
It supports a delegate as a predicate, too, which made my code a lot
simpler (got rid of a dummy struct who's only purpose was to provide
the necessary opCmp I wanted).

So yeh, there's not much point for built-in .sort (and .reverse)...
except the ever-present "it it's easier to get CTFE working if it's
built-in", because the compiler can have a special case CTFE
implementation of those operations.

--bb