Characters in D

Eugene lecom at yandex.ru
Sat Nov 2 18:45:50 UTC 2019


On Saturday, 2 November 2019 at 18:26:57 UTC, user4567 wrote:
> On Saturday, 2 November 2019 at 18:09:01 UTC, Eugene wrote:
>> On Saturday, 2 November 2019 at 15:54:02 UTC, Adam D. Ruppe 
>> wrote:
>>> On Saturday, 2 November 2019 at 15:44:49 UTC, Eugene wrote:
>>>> "Variable of type char can only hold letters that are in the 
>>>> ASCII table". (section 15.4 Character literals)
>>>> So why there is executed next code?
>>>
>>> The individual char can only hold those, but a group of chars 
>>> can hold anything.
>>>
>>>> char[] cyrillics = "привет".dup;
>>>
>>> this works because the "string" has multi-char groupings
>>>
>>>> char[] cyrillics = ['п', 'р', 'и', 'в', 'е', 'т']; //not
>>>
>>> and this doesn't because you are specifying individual items 
>>> there so it can't just spread them across multiple bytes
>>
>> Um. It is not obvious at all. What's mean spread across 
>> multiple bytes?
>
> it's encoded in UTF-8, for example the **string** "п" takes 2 
> `char`s, although it's only one grapheme.
>
>     assert("привет".length == 12); // encoded as UTF-8
>     assert("привет"d.length == 6); // decoded, each dchar is 4 
> bytes and can contain a cyrilic character.

"п" is represented by two code units, but "п"d is represented by 
one code point, therefore 12 and 6 respectively. Function dup 
manipulates by code units and represents their to char[]. So?


More information about the Digitalmars-d mailing list