Proposal for fixing dchar ranges
Steven Schveighoffer
schveiguy at yahoo.com
Tue Mar 11 11:02:26 PDT 2014
On Tue, 11 Mar 2014 13:18:46 -0400, Chris Williams
<yoreanon-chrisw at yahoo.co.jp> wrote:
> On Tuesday, 11 March 2014 at 14:16:31 UTC, Steven Schveighoffer wrote:
>> But I would never expect any kind of indexing or slicing to use "number
>> of code points", which clearly requires O(n) decoding to determine it's
>> position. That would be disastrous.
>
> If the indexes put into the slice aren't by code-point, but people need
> to use proper helper functions to convert a code-point into an index,
> then we're basically back to where we are today.
No, where we are today is that in some cases, the language treats a char[]
as an array of char, in other cases, it treats a char[] as a
bi-directional dchar range.
What I'm proposing is we have a type that defines "This is what a string
looks like," and it is consistent across all uses of the string, instead
of the schizophrenic view we have now. I would also point out that quite a
bit of deception and nonsense is needed to maintain that view, including
things like assert(!hasLength!(char[]) && __traits(compiles, { char[] x;
int y = x.length;})). The documentation for hasLength says "Tests if a
given range has the length attribute," which is clearly a lie.
However, I want to define right here, that index is not a number of code
points. One does not frequently get code point counts, one gets indexes.
It has always been that way, and I'm not planning to change that. That you
can't use the index to determine the number of code points that came
before it, is not a frequent issue that arises.
e.g., I want to find the first instance of "xyz" in a string, do I care
how many code points it has to go through, or what point I have to slice
the string to get that?
A previous poster brings up this incorrect code:
auto index = countUntil(str, "xyz");
auto newstr = str[index..$];
But it can easily be done this way also:
auto index = indexOf(str, "xyz");
auto codepts = walkLength(str[0..index]);
auto newstr = str[index..$];
Given how D works, I think it would be very costly and near impossible to
somehow make the incorrect slice operation statically rejected. One simply
has to be trained what a code point is, and what a code unit is. HOWEVER,
for the most part, nobody needs to care. Strings work fine without having
to randomly access specific code points or slice based on them. Using
indexes works just fine.
-Steve
More information about the Digitalmars-d
mailing list