sort API design

Thu Nov 21 23:28:48 UTC 2019

On Thursday, November 21, 2019 3:19:45 PM MST Johannes Riecken via 
Digitalmars-d wrote:
> On Wednesday, 20 November 2019 at 09:33:25 UTC, Johannes Riecken
>
> wrote:
> > I had looked at how to do case-insensitive string sorting with
> > Phobos and without looking at the documentation much, I went
> > with the following code and then wrongly concluded that `<`
> > does case-insensitive comparisons on strings by default:
> >
> > ```
> > import std.algorithm;
> > import std.stdio;
> > import std.string;
> >
> > void main() {
> >
> >     auto arr0 = ["foo", "bar", "Baz"];
> >     auto case_sensitive   = sort!((a, b) => a           < b
> >
> >    )(arr0);
> >
> >     auto case_insensitive = sort!((a, b) => a.toLower() <
> >
> > b.toLower())(arr0);
> >
> >     writeln(case_sensitive);
> >     writeln(case_insensitive);
> >
> > }
> > ```
> > Output:
> > ```
> > ["bar", "Baz", "foo"]
> > ["bar", "Baz", "foo"]
> > ```
> >
> > Later I learned that I was trapped by `sort` being in-place but
> > still returning a value. If this API were to be designed from
> > scratch, I wonder if it would be possible to rule out this
> > incorrect usage at compile-time while still retaining the other
> > nice features of the `sort` function like being in-place and
> > giving strong types which can be used to choose faster
> > algorithms to further transform the result, etc.? Or are there
> > any ideas in programming language research that would help with
> > such a case?
>
> Thanks, that's some nice ideas in this thread! I should
> definitely get into the habit of using const wherever possible.
> Jonathan, I think a major selling point of Phobos and D is that
> in lots of cases it can catch erroneous API usage at compile time.

There's nothing erroneous about sorting a range in two different ways
(though it obviously wouldn't make sense to do so twice in row without doing
anything with the range in between). And having sort return a new range that
was sorted while not affecting the one that was passed in would require
either sorting lazily (which really doesn't work) or allocating an entirely
new range in order to sort into - neither of which makes much sense. So, I'm
honestly very surprised that anyone would think that calling sort would
result in a new range without affecting the one that was passed in.

Regardless, the documentation makes the behavior of sort clear, and anyone
using a function needs to read its documentation if they don't want to run
into problems. If you don't read documentation, you're going to make bad
assumptions about how a function behaves eventually - if not frequently -
and end up with bugs in your code.

It is of course nice when the language is able to catch incorrect behavior
for you at compile time, but I don't see how that could be done here. Sure,
you used the function incorrectly, and if sort allocated a new range to sort
into and returned that, you wouldn't have been bitten. However, pretty much
every sort function I have ever seen sorts in-place, so I would expect most
folks to assume that sort sorted the range in-place. And if sort returned a
newly allocated range, anyone making that assumption and used sort without
reading the documentation would end up with buggy code and could have
complaints similiar to yours. Whether sort returns a newly allocated range
or sorts in-place, _someone_ is going to make the wrong assumption about how
it works and get bitten without the compiler being able to help them out.
And the current implementation is by far more efficient than the version
that returned a newly allocated range would be. It's also more flexible,
since it allows you to sort the same range multiple times, multiple ways if
that makes sense for the code. Anyone who wants to get a new range can
simply allocate a new array to hold the data - either via .dup if the range
to be sorted is an array or by calling save on the range and passing it to
std.array.array, making your code something like

    auto case_sensitive   = sort!((a, b) => a < b)(arr0.dup);
    auto case_insensitive =
        sort!((a, b) => a.toLower() < b.toLower())(arr0.dup);

So, I'm sorry that you gotten bitten by sort not working the way you
expected, but I don't see how that could have been prevented without
implementing the range in the way that you expected, which would then screw
over anyone who assumed that the range would be sorted in-place. The
compiler can't fix this for you. Ultimately, you have to read the
documentation. And given the inefficiency of returning a sorted range
without affecting the one that was passed in, the current implementation
makes far more sense than how you assumed that sort worked.

Even if D is able to catch more problems at compile time than another
language, there's still a limit to how many bugs it can catch. There's a
reason that unit tests are a feature that's built into D.

- Jonathan M Davis