Proposal for fixing dchar ranges

Tue Mar 11 11:02:26 PDT 2014

On Tue, 11 Mar 2014 13:18:46 -0400, Chris Williams  
<yoreanon-chrisw at yahoo.co.jp> wrote:

> On Tuesday, 11 March 2014 at 14:16:31 UTC, Steven Schveighoffer wrote:
>> But I would never expect any kind of indexing or slicing to use "number  
>> of code points", which clearly requires O(n) decoding to determine it's  
>> position. That would be disastrous.
>
> If the indexes put into the slice aren't by code-point, but people need  
> to use proper helper functions to convert a code-point into an index,  
> then we're basically back to where we are today.

No, where we are today is that in some cases, the language treats a char[]  
as an array of char, in other cases, it treats a char[] as a  
bi-directional dchar range.

What I'm proposing is we have a type that defines "This is what a string  
looks like," and it is consistent across all uses of the string, instead  
of the schizophrenic view we have now. I would also point out that quite a  
bit of deception and nonsense is needed to maintain that view, including  
things like assert(!hasLength!(char[]) && __traits(compiles, { char[] x;  
int y = x.length;})). The documentation for hasLength says "Tests if a  
given range has the length attribute," which is clearly a lie.

However, I want to define right here, that index is not a number of code  
points. One does not frequently get code point counts, one gets indexes.  
It has always been that way, and I'm not planning to change that. That you  
can't use the index to determine the number of code points that came  
before it, is not a frequent issue that arises.

e.g., I want to find the first instance of "xyz" in a string, do I care  
how many code points it has to go through, or what point I have to slice  
the string to get that?

A previous poster brings up this incorrect code:

auto index = countUntil(str, "xyz");
auto newstr = str[index..$];

But it can easily be done this way also:

auto index = indexOf(str, "xyz");
auto codepts = walkLength(str[0..index]);
auto newstr = str[index..$];

Given how D works, I think it would be very costly and near impossible to  
somehow make the incorrect slice operation statically rejected. One simply  
has to be trained what a code point is, and what a code unit is. HOWEVER,  
for the most part, nobody needs to care. Strings work fine without having  
to randomly access specific code points or slice based on them. Using  
indexes works just fine.

-Steve