Why can't D store all UTF-8 code units in char type? (not really understanding explanation)
ag0aep6g
anonymous at example.com
Sat Dec 3 13:02:27 UTC 2022
On 02.12.22 22:39, thebluepandabear wrote:
> Hm, that specifically might not be. The thing is, I thought a UTF-8 code
> unit can store 1-4 bytes for each character, so how is it right to say
> that `char` is a utf-8 code unit, it seems like it's just an ASCII code
> unit.
You're simply not using the term "code unit" correctly. A UTF-8 code
unit is just one of those 1-4 bytes. Together they form a "sequence"
which encodes a "code point".
And all (true) ASCII code units are indeed also valid UTF-8 code units.
Because UTF-8 is a superset of ASCII. If you save a file as ASCII and
open it as UTF-8, that works. But it doesn't work the other way around.
More information about the Digitalmars-d-learn
mailing list