Proposal for fixing dchar ranges

Steven Schveighoffer schveiguy at yahoo.com
Mon Mar 10 13:00:18 PDT 2014


On Mon, 10 Mar 2014 15:30:00 -0400, John Colvin  
<john.loughran.colvin at gmail.com> wrote:

> On Monday, 10 March 2014 at 18:09:51 UTC, Steven Schveighoffer wrote:
>>
>> Because one can slice out a multi-code-unit code point, one cannot  
>> access it via index. Strings would be horribly crippled without  
>> slicing. Without indexing, they are fine.
>>
>> A possibility is to allow index, but actually decode the code point at  
>> that index (error on invalid index). That might actually be the correct  
>> mechanism.
>>
>
> In order to be correct, both require exactly the same knowledge: The  
> beginning of a code point, followed by the end of a code point. In the  
> indexing case they just happen to be the same code-point and happen to  
> be one code unit from each other. I don't see how one is any more or  
> less errror-prone or fundamentally wrong than the other.

Using indexing, you simply cannot get the single code unit that represents  
a multi-code-unit code point. It doesn't fit in a char. It's guaranteed to  
fail, whereas slicing will give you access to the all the data in the  
string.

Now, with indexing actually decoding a code point, one can alias a[i] to  
a[i..$].front(), which means decode the first code point you come to at  
index i. This means indexing is slow(er), and returns a dchar. I think as  
a first step, that might be too much to add silently. I'd rather break it  
first, then add it back later.

-Steve


More information about the Digitalmars-d mailing list