DMD: invalid UTF character `\U0000d800`

Jacob Carlborg doob at me.com
Sat Nov 7 17:49:54 UTC 2020


On Saturday, 7 November 2020 at 16:12:06 UTC, Per Nordlöw wrote:

>  CtoLexer_parser.d   665  57 error           invalid UTF 
> character \U0000d800
>  CtoLexer_parser.d   665  67 error           invalid UTF 
> character \U0000dbff
>  CtoLexer_parser.d   666  28 error           invalid UTF 
> character \U0000d800
>  CtoLexer_parser.d   666  38 error           invalid UTF 
> character \U0000dbff
>  CtoLexer_parser.d   666  53 error           invalid UTF 
> character \U0000dc00
>  CtoLexer_parser.d   666  63 error           invalid UTF 
> character \U0000dfff
>
> Doesn't DMD support these Unicodes yet?

They're not valid:

"The Unicode standard permanently reserves these code point 
values for UTF-16 encoding of the high and low surrogates, and 
they will never be assigned a character, so there should be no 
reason to encode them. The official Unicode standard says that no 
UTF forms, including UTF-16, can encode these code points" [1].

"... the standard states that such arrangements should be treated 
as encoding errors" [1].

Perhaps they need to be combined with other code points to form a 
valid character.

[1] https://en.wikipedia.org/wiki/UTF-16#U+D800_to_U+DFFF

--
/Jacob Carlborg




More information about the Digitalmars-d-learn mailing list