How to print unicode characters (no library)?
Patrick.Schluter at bbox.fr
Tue Dec 28 12:20:17 UTC 2021
On Monday, 27 December 2021 at 07:12:24 UTC, rempas wrote:
> I don't understand that. Based on your calculations, the
> results should have been different. Also how are the numbers
> fixed? Like you said the amount of bytes of each encoding is
> not always standard for every character. Even if they were
> fixed this means 2-bytes for each UTF-16 character and 4-bytes
> for each UTF-32 character so still the numbers doesn't make
> sense to me. So still the number of the "length" property
> should have been the same for every encoding or at least for
> UTF-16 and UTF-32. So are the sizes of every character fixed or
Your string is represented by 8 codepoints. The number of
codeunits to represent them in memory depends on the encoding. D
supports to work with 3 different encodings (in the Unicode
standard there are more than these 3)
string utf8s = "Hello 😂\n";
wstring utf16s = "Hello 😂\n"w;
dstring utf32s = "Hello 😂\n"d;
Here the canonical Unicode representation of your string
H e l l o 😂 \n
U+0048 U+0065 U+006C U+006C U+006F U+0020 U+1F602 U+000a
let's see how these 3 variable are represented in memory:
utf8s : 48 65 6C 6C 6F 20 F0 9F 98 82 0a
11 char in memory using 11 bytes
utf16s: 0048 0065 006C 006C 006F 0020 D83D DE02 000A
9 wchar in memory using 18 bytes
utf16s: 00000048 00000065 0000006C 0000006C 0000006F 00000020
8 dchar in memory using 32 bytes
As you can see, the most compact form is generally UTF-8, that's
why it is the preferred encoding for Unicode.
UTF-16 is supported because of legacy support reason like it is
used in the Windows API and also internally in Java.
UTF-32 has one advantage, in that it has a 1 to 1 mapping between
codepoint and array index. In practice it is not that much of an
advantage as codepoints and characters are disjoint concepts.
UTF-32 uses a lot of memory for practically no benefit (when you
read in the forum about the big auto-decode error of D it is
linked to this).
More information about the Digitalmars-d-learn