std.experimental.collections.rcstring and its integration in Phobos

Wed Jul 18 12:15:52 UTC 2018

On Tuesday, 17 July 2018 at 18:09:13 UTC, Jonathan M Davis wrote:
> On Tuesday, July 17, 2018 17:28:19 Seb via Digitalmars-d wrote:
>> On Tuesday, 17 July 2018 at 16:58:37 UTC, Jonathan M Davis 
>> wrote:
>> > [...]
>>
>> Well, there are few cases where the range type doesn't matter 
>> and one can simply compare bytes, e.g.
>>
>> equal (e.g. "ä" == "ä" <=> [195, 164] == [195, 164])
>> commonPrefix
>> find
>> ...
>
> That effectively means treating rcstring as a range of char by 
> default rather than not treating it as a range by default. And 
> if we then do that only with functions that overload on 
> rcstring rather than making rcstring actually a range of char, 
> then why aren't we just treating it as a range of char in 
> general?
>
> IMHO, the fact that so many alogorithms currently special-case 
> on arrays of characters is one reason that auto-decoding has 
> been a disaster, and adding a bunch of overloads for rcstring 
> is just compounding the problem. Algorithms should properly 
> support arbitrary ranges of characters, and then rcstring can 
> be passed to them by calling one of the functions on it to get 
> a range of code units, code points, or graphemes to get an 
> actual range - either that, or rcstring should default to being 
> a range of char. going halfway and making it work with some 
> functions via overloads really doesn't make sense.

Well, the problem of it being a range of char is that this might 
lead to very confusing behavior, e.g.

"ä".rcstring.split.join("|") == �|�

So we probably shouldn't go this route either.
The idea of adding overloads was to introduce a bit of 
user-convenience, s.t. they don't have to say

readText("foo".rcstring.by!char)

all the time.

> You can still normalize with auto-decoding (the code units - 
> and thus code points - are in a specific order even when 
> encoded, and that order can be normalized), and really, anyone 
> who wants fully correct string comparisons needs to be 
> normalizing their strings. With that in mind, rcstring probably 
> should support normalization of its internal representation.

It currently doesn't support this out of the box, but it's a very 
valid point and I added it to the list.