Why can't D store all UTF-8 code units in char type? (not really understanding explanation)
thebluepandabear
therealbluepandabear at protonmail.com
Fri Dec 2 21:18:44 UTC 2022
Hello (noob question),
I am reading a book about D by Ali, and he talks about the
different char types: char, wchar, and dchar. He says that char
stores a UTF-8 code unit, wchar stores a UTF-16 code unit, and
dchar stores a UTF-32 code unit, this makes sense.
He then goes on to say that:
"Contrary to some other programming languages, characters in D
may consist of
different numbers of bytes. For example, because 'Ğ' must be
represented by at
least 2 bytes in Unicode, it doesn't fit in a variable of type
char. On the other
hand, because dchar consists of 4 bytes, it can hold any Unicode
character."
It's his explanation as to why this code doesn't compile even
though Ğ is a UTF-8 code unit:
```D
char utf8 = 'Ğ';
```
But I don't really understand this? What does it mean that it
'must be represented by at least 2 bytes'? If I do `char.sizeof`
it's 2 bytes so I am confused why it doesn't fit, I don't think
it was explained well in the book.
Any help would be appreciated.
More information about the Digitalmars-d-learn
mailing list