std.experimental.collections.rcstring and its integration in Phobos

Tue Jul 17 18:09:13 UTC 2018

On Tuesday, July 17, 2018 17:28:19 Seb via Digitalmars-d wrote:
> On Tuesday, 17 July 2018 at 16:58:37 UTC, Jonathan M Davis wrote:
> > On Tuesday, July 17, 2018 15:21:30 Seb via Digitalmars-d wrote:
> >> [...]
> >
> > If it's not a range by default, why would you expect _anything_
> > which operates on ranges to work with rcstring directly? IMHO,
> > if it's not a range, then range-based functions shouldn't work
> > with it, and I don't see how they even _can_ work with it
> > unless you assume code units, or code points, or graphemes as
> > the default. If it's designed to not be a range, then it should
> > be up to the programmer to call the appropriate function on it
> > to get the appropriate range type for a particular use case, in
> > which case, you really shouldn't need to add much of any
> > overloads for it.
> >
> > - Jonathan M Davis
>
> Well, there are few cases where the range type doesn't matter and
> one can simply compare bytes, e.g.
>
> equal (e.g. "ä" == "ä" <=> [195, 164] == [195, 164])
> commonPrefix
> find
> ...

That effectively means treating rcstring as a range of char by default
rather than not treating it as a range by default. And if we then do that
only with functions that overload on rcstring rather than making rcstring
actually a range of char, then why aren't we just treating it as a range of
char in general?

IMHO, the fact that so many alogorithms currently special-case on arrays of
characters is one reason that auto-decoding has been a disaster, and adding
a bunch of overloads for rcstring is just compounding the problem.
Algorithms should properly support arbitrary ranges of characters, and then
rcstring can be passed to them by calling one of the functions on it to get
a range of code units, code points, or graphemes to get an actual range -
either that, or rcstring should default to being a range of char. going
halfway and making it work with some functions via overloads really doesn't
make sense.

Now, if we're talking about functions that really operate on strings and not
ranges of characters (and thus do stuff like append), then that becomes a
different question, but we've mostly been trying to move away from functions
like that in Phobos.

> Of course this assumes that there's no normalization necessary,
> but the current auto-decoding assumes this too.

You can still normalize with auto-decoding (the code units - and thus code
points - are in a specific order even when encoded, and that order can be
normalized), and really, anyone who wants fully correct string comparisons
needs to be normalizing their strings. With that in mind, rcstring probably
should support normalization of its internal representation.

- Jonathan M Davis