Why can't D store all UTF-8 code units in char type? (not really understanding explanation)

ag0aep6g anonymous at example.com
Sat Dec 3 13:02:27 UTC 2022


On 02.12.22 22:39, thebluepandabear wrote:
> Hm, that specifically might not be. The thing is, I thought a UTF-8 code 
> unit can store 1-4 bytes for each character, so how is it right to say 
> that `char` is a utf-8 code unit, it seems like it's just an ASCII code 
> unit.

You're simply not using the term "code unit" correctly. A UTF-8 code 
unit is just one of those 1-4 bytes. Together they form a "sequence" 
which encodes a "code point".

And all (true) ASCII code units are indeed also valid UTF-8 code units. 
Because UTF-8 is a superset of ASCII. If you save a file as ASCII and 
open it as UTF-8, that works. But it doesn't work the other way around.


More information about the Digitalmars-d-learn mailing list