ranges of characters and the overabundance of traits that go with them

Thu Mar 16 00:44:59 PDT 2017

On Wednesday, March 15, 2017 10:17:41 H. S. Teoh via Digitalmars-d wrote:
> On Tue, Mar 14, 2017 at 07:21:05PM -0700, Jonathan M Davis via 
Digitalmars-d wrote:
> I'd argue that the same reasoning applies to all D libraries, not just
> Phobos: implementation details like optimizing for narrow strings ought
> to be hidden from the public API.  It's the library author's problem to
> figure out how to optimize his library -- Phobos code, after all, is
> publicly available for perusal and its license terms permit copying
> Phobos snippets for use in their own library code, if they find that
> they need, e.g., some form of isNarrowString.  But none of this should
> be a factor in the design of the user-facing API.  Phobos needs to set
> the example here.
>
> And to counter the argument that library authors might want to use
> Phobos' version of isNarrowString: I'd say that it's better for them to
> copy the definition of isNarrowString into their own code, because the
> current mess of isNarrowString, isSomeString, et al, means that there's
> a high likelihood we'd want to revise their definitions, which in turn
> means that if they're being used in 3rd party library code, subtle
> breakages may be introduced into said libraries because the definition
> has changed from when the library author originally looked at it.
>
> That's why, IMO, traits like isNarrowString that are only useful for
> library implementation details ought NOT to be exported by Phobos.
> Unless we are 100% sure that it has a rock-solid definition that's
> well-designed and fits well with other standard traits -- which excludes
> most (all?) of the current messy string-related traits.

I completely agree that we should try and make the template constraints on
public functions in Phobos - and other libraries be simple - and try and
restrict a lot of the more complicated stuff to internals. However, I do not
agree for a second that we should be making all of those traits private and
telling folks to copy-paste them if they want to use them. If a trait is bad
enough that we think that it should not exist, then we should look at
deprecating it, but if it's truly useful in Phobos, then it's going to be
useful in 3rd party code, and it should be publicly available - even more so
if it's currently publicly available. Telling folks that we want to
deprecate something to make it private and that they should copy-paste it if
they want it just seems rude to me.

The reality of the matter though is that if you're talking about traits like
isSomeString and isNarrowString, they've been around as long as ranges - if
not longer - and I don't think that it's particularly reasonable to
deprecate them at this point. There's going to be a _lot_ of code out there
using them, and most of it is likely using them correctly. The only real
problem with them is that they accept enums, so if you use them with
non-range based code, and you're not careful, then you could have a problem
if someone tries to use an enum with them. But while that's definitely a
problem, for a lot of code, it simply won't matter. So, if we deprecated
isSomeString, we'd be telling a lot of folks to change their code when it's
perfectly fine as-is. I agree that it would be nice to fix isSomeString, but
I just don't think that it's reasonable to deprecate it at this point.

isAutodecodableString and isConvertibleToString, on the other hand, are
rather new. So, not much code is using them - either in Phobos or in 3rd
party code - and we won't affect much if we remove them. They're also highly
questionable in general (whereas aside from the enum issue, isSomeString and
isNarrowString are fine). Using either of them is an @safety risk. So, those
two should definitely go IMHO.

In general though, I don't think that Phobos is anything special beyond the
fact that almost everyone is going to be using it and that ideally, it would
be an example of how to write good D code. Any public D library is in a boat
similar to Phobos with regards to what it needs to do with templates, and if
a trait is useful in Phobos in general, it makes more sense to make it
generally available than to hide it. And if we're talking about something
that's already public, I don't think that it makes sense to make it private.
If it's bad enough for that to be a real consideration, then not even Phobos
should be using it.

> Is it possible to hide this stuff behind a "template wall"? I.e., rename
> the current overload set to xxxImpl() with whatever sig constraints are
> necessary to achieve what we want, and have the original xxx() be just a
> generic template that forwards the said overloads?  E.g.:
>
> Change:
>
>   auto xxx(R)(R r) if (isConvertibleToString!R) { ... }
>   auto xxx(R)(R r) if (isAutodecodableString!R) { ... }
>   auto xxx(R)(R r) if (/* etc. */) { ... }
>
> To:
>
>   auto xxxImpl(R)(R r) if (isConvertibleToString!R) { ... }
>   auto xxxImpl(R)(R r) if (isAutodecodableString!R) { ... }
>   auto xxxImpl(R)(R r) if (/* etc. */) { ... }
>
>   auto xxx(R)(R r) if (isInputRange!R) { return xxxImpl(r); }
>
> Or does that extra level of syntactic indirection cause problems with
> implicit conversions?

While it will work in some cases to let the conversion take place inside the
function, in others, it's wrong code in comparison to the original version
that took a string. In particular, if the function returns a slice of the
original string, with the original code, if a static array was passed to it,
then it would have been sliced at the call site, and the string that was
returned from the function was a slice of the original static array.
However, if you templatize it and do the conversion inside the function,
then you end up slicing the copy inside the function and returning a string
that refers to memory inside the function that just returned (and is thus
invalid). Using auto ref reduces the problem but doesn't fix it. e.g.

auto foo(inout(char)[] str)
{
    return str[0 .. 5];
}

char[10] sa;
auto arr = foo(sa);

becomes wrong code if you do

auto foo(R)(R range)
    if(isRandomAccessRange!R && hasSlicing!R && isSomeChar!(ElementType!R))
{
    return range[0 .. 5];
}

auto foo(T)(T convertible)
    if(isConvertibleToString!T)
{
    // This will now return a slice of a variable local to this function
    // whereas before the conversion occured at the call site, and the
    // memory that was referred to was from outside this function.
    return foo!(StringTypeOf!T)(convertible);
}

char[10] sa;
auto arr = foo(sa);

auto ref reduces the problem, but you're just as screwed as soon as an
rvalue is passed instead of an lvalue. The only way to have the same
semantics and avoid the problem is to force the conversion at the call site,
and the only way that I know how to do that is to templatize on the
character type. e.g.

auto foo(R)(R range)
    if(!isSomeString!R &&
       isRandomAccessRange!R &&
       hasSlicing!R &&
       isSomeChar!(ElementType!R))
{
    return _fooImpl(range);
}

auto foo(C)(C[] str)
    if(isSomeChar!C)
{
    return _fooImpl(str);
}

private auto _fooImpl(R)(R range)
{
    return range[0 .. 5];
}

If you templatize on the full type like you normally would, then the
conversion takes place inside the function, and it doesn't work. In order to
fix that, we'd need some way to put a constraint on the argument type and
yet have the parameter type be something different - e.g. require that the
argument by convertible to string but have the actual parameter type be
something like inout(char)[] or have its own template type so that it could
be something like C[] and take an arbitrary character type. Regardless,
templates simply don't work like that in D in general, let alone with IFTI.

All of this isn't a problem in the same way if the function being
templatized does not return a slice of the original string, but without auto
ref, it does still result in additional copies in comparison to the
original, and it would be pretty easy to screw it up. I expect that Walter's
@safety changes will make finding the @safetr problems with such implicit
conversions easier, but the problem remains that in the general case, you
really need to be doing the conversion at the call site in order to avoid
@safety problems.

> > As for isSomeString, the only problem I see with it is that it accepts
> > enums. Otherwise, it seems like a perfectly sensible trait to have,
> > and its name fits in with other traits in std.traits such that I don't
> > know what we'd call it instead other than isExactSomeString (which
> > std.conv does internally, but it's not exactly an ideal name).
> > Regardless, I completely disagree about trying to get rid of it or
> > hiding it. If it doesn't need to be in a template constraint, then it
> > shouldn't be, but it's still a useful trait to have.
>
> I don't like the name, because it's unclear what exactly it means and
> what exactly it includes. The fact that it accepts enums where other
> traits don't is a further code smell IMO.
>
> I think what we really need to do is to sit down and work out a
> well-designed, orthogonal set of traits that might be useful for user
> code, independently of what the current traits are, and then (and only
> then) worry about how to map the current mess to them.
>
> If I were to look at the current traits from a user's POV, I'd be going,
> "What, why does Phobos have isNarrowString, isSomeString,
> isAutodecodableString...  How many different kinds of strings does D
> have?! and how exactly do any of these traits map to string, wstring,
> dstring? WTH does "Some" in "isSomeString" refer to? Is it different
> from "isAnyString" and "isString"? What kind of crappy ad hoc trait
> design is this anyway?!" Not the kind of reaction we'd want newcomers to
> have with code that's supposed to set the standard for what D code
> should look like.

If it were just isSomeString and isNarrowString, I don't think that it would
be that big a deal. While isNarrowString is a new concept, it's pretty
integral to ranges at this point, and isSomeString really isn't a new
concept. So, with just the two of them, I don't think that there's really a
problem (which is where we used to be).

As for the name, is isSomeString is not the only trait in std.traits with
some in the name, and it makes sense given that isString would imply
is(T == string). isAnyString might have been fine too, but that's not what
we ended up with. If we want to replace isSomeString with a trait that
didn't accept enums, then isAnyString would make sense, but then it really
would be increasing the confusion, and it wouldn't fit with the current
naming scheme in std.traits. If anything, the issues with the number of
string-related traits implies that we should simply document the issue with
isSomeString and not introduce a replacement. And given how old it is, a lot
of code is likely to exist which uses it, so I think that it's quite
questionable that we can replace it.

The real problem is that we have isAutodecodableString and
isConvertibleToString, and as I said before, I think that those should go.
They're new enough that we won't break a lot of code when we deprecate them,
and they're risky to keep. And if we get rid of them, we're down to only two
string-specific traits again, and I don't think that that's all that bad,
particularly when they're both useful and embody concepts that anyone
serious about programming with ranges in D is going to need to understand.
The reality of the matter is that even with stuff like byCodeUnit and
byGrapheme, the fact that strings auto-decode by default requires that your
average D programmer be at least somewhat familiar with those concepts,
which then makes isSomeString and isNarrowString quite straightforward.

> It's clear that autodecoding can't be *removed* outright, at least not
> at this point. But we *can* reduce dependence on it by making it a
> Phobos internal implementation detail rather than exposing it everywhere
> on the public API with traits like isNarrowString.
>
> Let's not conflate cleaning up Phobos' public API with removing Phobos
> internal code or changing Phobos semantics (e.g., the "helper stuff").
> The two ought to be more-or-less independent, and where they are not,
> it's a sign of weakness in the design of the API, and is where we need
> to change things to improve the situation.

I think that we should start off by

1. deprecating isAutodecodableString and isConvertibleToString and fixing
   Phobos so that it doesn't use them.

2. documenting isSomeString and isNarrowString to make it clear that the
   fact that they match enums is a problem.

3. one by one going through the various functions in Phobos and turning
   overloads into internal static ifs where appropriate.

Doing that won't outright fix the problems with the string traits, but it
gets us going back to just two string traits while making the problems with
them clear. And of course, we've already discussed how it would be better to
turn the overloads into static ifs or overloads of internal functions in
order to simplify the public facing constraints - regardless of what's going
on with the string-specific constraints.

I don't agree at all though that any of the currently public-facing traits
should be made private. Making it so that they're not used in public-facing
template constraints when they don't need to be makes a lot of sense, but I
see no reason to hamstring 3rd party libraries just because we don't want to
put the traits in Phobos' public-facing template constraits if we can avoid
it.

- Jonathan M Davis