Why is char initialized to 0xFF ?
James Blachly
james.blachly at gmail.com
Sat Jun 8 17:55:07 UTC 2019
Disclaimer: I am not a unicode expert.
Background: I have added UTF8 character type support to lldb in
conjunction with adding support for D string/wstring/dstring.
Dlang char is analogous to C++20 char8_t[1] AFAICT.
The default initialization value in C++20 is u8'\0', whereas in D
char.init is '\xFF'[2]. Likewise, wchar .init is 0xFFFF and dchar is
0x0000FFFF.
char is a UTF8 character, but 0xFF is specifically forbidden[3] by the
UTF8 specification.
What is the reasoning behind this? Is it related to zero-termination of
C strings? Should it be considered for change?
It is surprising that these do not init to the null value, which is
valid UTF.
Kind regards
James
[1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0482r6.html
[2] https://dlang.org/spec/type.html
[3] https://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences
More information about the Digitalmars-d
mailing list