Characters in D

user4567 user4567 at 1234.te
Sat Nov 2 20:49:15 UTC 2019


On Saturday, 2 November 2019 at 18:45:50 UTC, Eugene wrote:
> On Saturday, 2 November 2019 at 18:26:57 UTC, user4567 wrote:
>> On Saturday, 2 November 2019 at 18:09:01 UTC, Eugene wrote:
>>> On Saturday, 2 November 2019 at 15:54:02 UTC, Adam D. Ruppe 
>>> wrote:
>>>> On Saturday, 2 November 2019 at 15:44:49 UTC, Eugene wrote:
>>>>> "Variable of type char can only hold letters that are in 
>>>>> the ASCII table". (section 15.4 Character literals)
>>>>> So why there is executed next code?
>>>>
>>>> The individual char can only hold those, but a group of 
>>>> chars can hold anything.
>>>>
>>>>> char[] cyrillics = "привет".dup;
>>>>
>>>> this works because the "string" has multi-char groupings
>>>>
>>>>> char[] cyrillics = ['п', 'р', 'и', 'в', 'е', 'т']; //not
>>>>
>>>> and this doesn't because you are specifying individual items 
>>>> there so it can't just spread them across multiple bytes
>>>
>>> Um. It is not obvious at all. What's mean spread across 
>>> multiple bytes?
>>
>> it's encoded in UTF-8, for example the **string** "п" takes 2 
>> `char`s, although it's only one grapheme.
>>
>>     assert("привет".length == 12); // encoded as UTF-8
>>     assert("привет"d.length == 6); // decoded, each dchar is 4 
>> bytes and can contain a cyrilic character.
>
> "п" is represented by two code units, but "п"d is represented 
> by one code point, therefore 12 and 6 respectively. Function 
> dup manipulates by code units and represents their to char[]. 
> So?

Oh I see what you ask, in first place we thought that you didn't 
get the implication of encoding. So it's just a rule. If you use 
`char` literals they must be ascii.

The rationale could be that this rule avoid bad surprises on the 
length of the array, otherwise I cant imagine anything else. ONly 
original designers (so Bright) must know the exact rationale... 
cant say more.


More information about the Digitalmars-d mailing list