Proposal for fixing dchar ranges
Steven Schveighoffer
schveiguy at yahoo.com
Mon Mar 10 13:00:18 PDT 2014
On Mon, 10 Mar 2014 15:30:00 -0400, John Colvin
<john.loughran.colvin at gmail.com> wrote:
> On Monday, 10 March 2014 at 18:09:51 UTC, Steven Schveighoffer wrote:
>>
>> Because one can slice out a multi-code-unit code point, one cannot
>> access it via index. Strings would be horribly crippled without
>> slicing. Without indexing, they are fine.
>>
>> A possibility is to allow index, but actually decode the code point at
>> that index (error on invalid index). That might actually be the correct
>> mechanism.
>>
>
> In order to be correct, both require exactly the same knowledge: The
> beginning of a code point, followed by the end of a code point. In the
> indexing case they just happen to be the same code-point and happen to
> be one code unit from each other. I don't see how one is any more or
> less errror-prone or fundamentally wrong than the other.
Using indexing, you simply cannot get the single code unit that represents
a multi-code-unit code point. It doesn't fit in a char. It's guaranteed to
fail, whereas slicing will give you access to the all the data in the
string.
Now, with indexing actually decoding a code point, one can alias a[i] to
a[i..$].front(), which means decode the first code point you come to at
index i. This means indexing is slow(er), and returns a dchar. I think as
a first step, that might be too much to add silently. I'd rather break it
first, then add it back later.
-Steve
More information about the Digitalmars-d
mailing list