Why is char initialized to 0xFF ?

James Blachly james.blachly at gmail.com
Sat Jun 8 17:55:07 UTC 2019


Disclaimer: I am not a unicode expert.

Background: I have added UTF8 character type support to lldb in 
conjunction with adding support for D string/wstring/dstring.

Dlang char is analogous to C++20 char8_t[1] AFAICT.

The default initialization value in C++20 is u8'\0', whereas in D 
char.init is '\xFF'[2]. Likewise, wchar .init is 0xFFFF and dchar is 
0x0000FFFF.

char is a UTF8 character, but 0xFF is specifically forbidden[3] by the 
UTF8 specification.

What is the reasoning behind this? Is it related to zero-termination of 
C strings? Should it be considered for change?

It is surprising that these do not init to the null value, which is 
valid UTF.

Kind regards
James


[1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0482r6.html
[2] https://dlang.org/spec/type.html
[3] https://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences


More information about the Digitalmars-d mailing list