std.experimental.collections.rcstring and its integration in Phobos

Wed Jul 18 16:00:23 UTC 2018

On Wednesday, July 18, 2018 12:15:52 Seb via Digitalmars-d wrote:
> Well, the problem of it being a range of char is that this might
> lead to very confusing behavior, e.g.
>
> "ä".rcstring.split.join("|") == �|�
>
> So we probably shouldn't go this route either.

I don't know. I'm fine with it not being a range and leaving it up to the
programmer, but part of the point here is that the programmer needs to
understand Unicode well enough to be able to do the right thing in cases
like this or they're screwed anyway. And if strings (of any variety) operate
as ranges of code units by default, the fact that there's a problem when
someone screws it up is going to be a lot more obvious.

Forcing people to call a function like by!char or by!dchar still requires
that they deal with all of this. It just makes it explicit. And that's not
necessarily a bad idea, but if someone is going to be confused by something
like split splitting in the middle of code points, they're going to be in
trouble with the bu function anyway.

> The idea of adding overloads was to introduce a bit of
> user-convenience, s.t. they don't have to say
>
> readText("foo".rcstring.by!char)
>
> all the time.

The wouldn't be doing anything that verbose anyway. In that case, you'd just
pass the string literal. At most, they'd be doing something like

readText(str.by!char);

And of course, readText is definitely _not_ @nogc. But regardless, these are
functions that are designed to be generic and take ranges of characters
rather than strings, and adding overloads for specific types just because we
don't want to call the function to get a range over them seems like it's
going in totally the wrong direction. It means adding a lot of overloads,
and we already have quite a mess thanks to all of the special-casing that we
have to avoid auto-decoding without getting into adding yet another set of
overloads for rcstring. We've put in the effort to genericize these
functions and make many of these functions work with ranges of characters
rather than strings, and I really don't think that we should start adding
overloads for specific string types just because we don't want to treat them
as ranges directly.

I'd honestly rather see an rcstring type that was just treated as a range of
char than see us adding overloads for rcstring. That's what arrays of char
should have been treated as in the first place, and we already have to do
stuff like byCodeUnit for strings anyway, so having to do by!char or
by!dchar really doesn't seem like a big deal to me - especially if the
alternative is adding a bunch of overloads.

- Jonathan M Davis