std.experimental.collections.rcstring and its integration in Phobos
Seb
seb at wilzba.ch
Wed Jul 18 12:15:52 UTC 2018
On Tuesday, 17 July 2018 at 18:09:13 UTC, Jonathan M Davis wrote:
> On Tuesday, July 17, 2018 17:28:19 Seb via Digitalmars-d wrote:
>> On Tuesday, 17 July 2018 at 16:58:37 UTC, Jonathan M Davis
>> wrote:
>> > [...]
>>
>> Well, there are few cases where the range type doesn't matter
>> and one can simply compare bytes, e.g.
>>
>> equal (e.g. "ä" == "ä" <=> [195, 164] == [195, 164])
>> commonPrefix
>> find
>> ...
>
> That effectively means treating rcstring as a range of char by
> default rather than not treating it as a range by default. And
> if we then do that only with functions that overload on
> rcstring rather than making rcstring actually a range of char,
> then why aren't we just treating it as a range of char in
> general?
>
> IMHO, the fact that so many alogorithms currently special-case
> on arrays of characters is one reason that auto-decoding has
> been a disaster, and adding a bunch of overloads for rcstring
> is just compounding the problem. Algorithms should properly
> support arbitrary ranges of characters, and then rcstring can
> be passed to them by calling one of the functions on it to get
> a range of code units, code points, or graphemes to get an
> actual range - either that, or rcstring should default to being
> a range of char. going halfway and making it work with some
> functions via overloads really doesn't make sense.
Well, the problem of it being a range of char is that this might
lead to very confusing behavior, e.g.
"ä".rcstring.split.join("|") == �|�
So we probably shouldn't go this route either.
The idea of adding overloads was to introduce a bit of
user-convenience, s.t. they don't have to say
readText("foo".rcstring.by!char)
all the time.
> You can still normalize with auto-decoding (the code units -
> and thus code points - are in a specific order even when
> encoded, and that order can be normalized), and really, anyone
> who wants fully correct string comparisons needs to be
> normalizing their strings. With that in mind, rcstring probably
> should support normalization of its internal representation.
It currently doesn't support this out of the box, but it's a very
valid point and I added it to the list.
More information about the Digitalmars-d
mailing list