More complexity creep in Phobos

Jonathan M Davis newsgroup.d at jmdavisprog.com
Sat Mar 30 19:22:37 UTC 2019


On Wednesday, March 27, 2019 8:17:35 PM MDT Andrei Alexandrescu via 
Digitalmars-d wrote:
> I started an alphabetical search through std modules, and as luck would
> have it I got as far as the first module: std/algorithm/comparison.d. In
> there, there's these overloads:
>
> size_t levenshteinDistance(alias equals = (a,b) => a == b, Range1, Range2)
> (Range1 s, Range2 t)
> if (isForwardRange!(Range1) && isForwardRange!(Range2));
>
> size_t levenshteinDistance(alias equals = (a,b) => a == b, Range1, Range2)
> (auto ref Range1 s, auto ref Range2 t)
> if (isConvertibleToString!Range1 || isConvertibleToString!Range2)
>
> (similar for levenshteinDistanceAndPath)
>
> What's with the second overload nonsense? The Levenshtein algorithm
> works on forward ranges. And that's about it, in a profound sense:
> Platonically what the algorithm needs is two forward ranges to operate.
> (We ought to be pretty proud of it, too: in all other languages I looked
> at, it's misimplemented to require random access.)
>
> The second overload comes from a warped sense of DWIM: well if this type
> converts to a string, we commit to support that too. Really. How about a
> struct that converts to an array? Should we put in a pull request to
> that, too? Where do we even stop?
>
> I hope there's not much more of this nonsense, because it all should be
> deprecated with fire.

This particular case is one that's a bit tricky. The correct solution with
stuff like this IMHO is to require that you pass actual forward ranges and
not types that convert to types that are forward ranges (like strings). The
problem is that a lot of code like this was originally written to take
strings and then generalized later, and when a function takes something like
string or const(char)[], it's going to implicitly convert varous types to
the target type at the call site. In contrast, you get no such implicit
conversion with a template constraint.

The result is that a bunch of functions in Phobos now use
isConvertibleToString on a separate overload to then convert the argument to
a string and pass it to the primary function. This is very, very wrong even
if it's not obvious. The problem is that the conversion then happens inside
the function instead of at the call site, which means that you get fun stuff
like when a static array is passed in, it's sliced, and if the function
returns any portion of the range it's given, then you're returning a slice
of the stack that is then invalid. The only way to make this work is to have
two overloads - the primary one, and one that accepts arrays specifically.
e.g.

auto foo(R)(R range)
    if(isForwardRange!R && ...)
{
    return fooImpl(range);
}

auto foo(C)(C[] str)
{
    return fooImpl(range);
}

private auto fooImpl(R)(R range)
{
}

That way, the conversion happens at the call site like it did when the
function accepted string. Similar problems exist in general with accepting
implicit conversions with a template constraint and is generally not
something that we should be doing unless it's for something really simple
like bool.

This is not at all a nice way to have to write this code, but it is required
to avoid code breakage when making the function work on ranges rather than
just strings. isConvertibleToString _could_ be used internally with static
if to avoid the additional overload, but again, it runs into the problem
that the implicit conversion is done internally. So, to support the implicit
conversion, a second overload is required.

The way that these functions _should_ have been written was to have them
operate on ranges in the first place, and then they wouldn't accept any
implict conversions from stuff like enums, but many of these functions
weren't. I don't know if that's quite the problem with levenshteinDistance,
but it's definitely causing a lot of the ugly overloads in Phobos with
anything string-related.

Unfortunately, I don't think that it's actually possible to just deprecate
the overload that accepts implicit conversions, because that's also the
overload that accepts strings. So, for many of these functions, I think that
we're stuck with having two overloads unless we do something like std.v2 and
have more of a hard breakage.

Either way, isConvertibleToString must go. Using it in a template constraint
is just begging for bugs, because the implicit conversion won't happen at
the call site like it needs to. I've done some work towards removing it from
Phobos, but due to health issues, I haven't had enough time to finish the
job.

- Jonathan M Davis





More information about the Digitalmars-d mailing list