ranges of characters and the overabundance of traits that go with them

Mon Mar 13 15:54:33 PDT 2017

On Monday, March 13, 2017 12:08:12 H. S. Teoh via Digitalmars-d wrote:
> 2) The /* user-readable constraints */  ought to be one of a small
> number of self-describing templates that tell the user exactly what the
> *intent* of the function is (note, *intent*, as in, implementation
> limitations should not be a factor).
>
> The current mess of isSomeString, isConvertibleToString, isNarrowString,
> etc., really should be internal to Phobos, and should not be exposed to
> the user at all. At the most, I'd say just ONE template that works for
> all strings and string-like ranges (and whatever else we wish to
> include) ought to be exposed to the user.  What should be done within
> Phobos itself (in private std.* space) is another matter -- we do need
> to handle the dirty details like auto-decoding, narrow strings, etc.,
> here. But the point is that this should be *internal* to Phobos, not
> exposed to the user like dirty laundry.

Simplifying public-facing template constraints makes a lot of sense.
However, that doesn't mean that all of the various traits should be private
or hidden within Phobos - just that the public functions themselves should
make their template constraints as general as possible. Remember that
there's nothing special about what Phobos is doing. It's just that it
happens to be the standard library, so it's used by a lot of folks. The rest
of the D ecosystem has all of these same concerns, and so even if 3rd party
libraries and programs also make their public-facing template constraints as
simple as possible, they're still going to need to use all of these traits
internally in the same way that Phobos does. These complexities can be
hidden on some level at the API level, but anyone writing APIs or writing
code that uses APIs but needs to do stuff that the APIs don't do for them is
still going to have to worry about these complexities.

> For the rest of the cases, my question is, is there any reason to *not*
> accept any arbitrary range of characters?

I'm not convinced that it never makes sense to have a function just operates
on arrays and not ranges. Certainly, lots of programs will be written that
way, because it takes more effort to make something work with generic ranges
than it does to work with arrays, and the extra effort often isn't worth it
if you're not creating a library to distribute to other programmers.

For the most part, something like Phobos should be using ranges, not just
arrays, but they're still a real thing for the D ecosystem at large -
especially if the code is mixing range behavior with appending. In
particular, that's not an uncommon thing to do in string code. So, while I
do think that we should be very wary of adding code to Phobos that operates
on arrays and not ranges, we should also remember that that's not always the
best way to do things in actual programs that are written to do a specific
thing - potentially on a tight schedule - rather than being generic and put
out in the wild for everyone to use.

> 4) Implicit conversions:  this is a tricky one, esp. once you factor in
> alias this.  Am I right to assume that this is mostly coming from
> functions that used to take strings explicitly, but later were
> templatized, thus losing some of the original implicit conversions?  If
> so, I'm inclined to do this:
>
>   auto myFunc(R)(R r)
>       /* N.B. no sig constraints */
>   {
>       static if (isConvertibleTo!(R,x))
>           ...
>       ... /* handle dirty laundry cases internally here */
>   }
>
> then document in the ddocs exactly what is expected of the incoming
> type. I.e., accept everything, then sort out the different cases
> internall as an implementation detail.

As I pointed on in my original post (though with how large it is, it
probably isn't hard to forget), this doesn't actually work properly with
implicit conversions - at least not in the general case. In particular, if
you have a function that takes a string or takes an array and is templated
on character type, it will currently accept static arrays, enums, and
user-defined types that implicitly convert to string. For static arrays, and
user-defined types that convert to strings in a similar manner to static
arrays (i.e. they're returning a slice of memory that will not be valid when
the object being converted goes out of scope), you have a safety problem if
do the implicit conversion inside the function. It needs to happen at the
call site, and that means _not_ doing something like

auto foo(T)(T t)
    if(isConvertibleToString!T)
{...}

- at least not in the general case. The problem is when any portion of the
original range is returned from the function. For instance, with this
ridiculously simple example

auto foo(inout(char)[] str)
{
    return str;
}

char[10] sa;
auto bar = foo(sa);

you have no safety problem so long as no slice of bar is returned from the
function that it's in. However, if you templatize foo

auto foo(R)(R range)
    if(isForwardRange!R && isSomeChar!(ElementType!R))
{
    static if(isConvertibleToString!R)
        return foo!(StringTypeOf!R)(range);
    else
        return str.save;
}

char[10] sa;
auto bar = foo(sa);

then sa is copied into foo and _then_ sliced, so bar ends up now referring
to memory inside of foo - which is out of scope and therefore invalid. You
can make it somewhat safer by doing

auto foo(R)(R range)
    if(!isConvertibleToString &&
       isForwardRange!R &&
       isSomeChar!(ElementType!R))
{
    return str.save;
}

auto foo(T)(auto ref T convertible)
    if(isConvertibleToString)
{
    return foo!(StringTypeOf!R)(range);
}

but as soon as the value being passed to be implicitly converted is an
rvalue rather than an lvalue, you have the same problem. That's less likely
with a static array (though still possible), but it's also quite possible
with a user-defined type. e.g.

struct S
{
    char[20] str;
    const(char)[] func() { return str; }
    alias str func;
}

The only way for this to be safe is for the implicit conversion to take
place at the call site like it did originally, and the only way that I can
come up with to guarantee that the conversion takes place at the call site
is to templatize on the character type (or element type if we're talking
about arrays in general and not just strings), in which case we get
something like

auto foo(R)(R range)
    if(!isSomeString!R &&
       isForwardRange!R &&
       isSomeChar!(ElementType!R))
{
    return _fooImpl(range);
}

auto foo(C)(C[] str)
    if(isSomeChar!C)
{
    return _fooImpl(range);
}

private auto _fooImpl(R range)
    if(isForwardRange!R && isSomeChar!(ElementType!R))
{
    return range.save;
}

Now, in this case, the code is so short that you might as well just
duplicate the functionality rather than create a helper function, but in
general, you'd want the helper function to avoid code duplication.

Regardless, this solution correctly and safely converts the original
function from one that takes string to one that works with generic ranges of
characters without introducing any safety problems. But it means that two
overloads are required, and it means that we can't deprecate the implicit
conversion functionality like Jack Stouffer suggested (because both strings
and the types that convert to them take the same overload). This
complication is a great example of why we do _not_ want to be writing new,
generic functions that take take types that implicitly convert to work with
the function. However, we're stuck if we want to templatize an existing
function so that it works on generic ranges rather than just arrays - or
generic ranges of characters rather than just strings.

- Jonathan M Davis