First Impressions!
Patrick Schluter
Patrick.Schluter at bbox.fr
Thu Nov 30 18:20:00 UTC 2017
On Thursday, 30 November 2017 at 17:40:08 UTC, Jonathan M Davis
wrote:
> English and thus don't as easily hit the cases where their code
> is wrong. For better or worse, UTF-16 hides it better than
> UTF-8, but the problem exists in both.
>
To give just an example of what can go wrong with UTF-16. Reading
a file in UTF-16 and converting it tosomething else like UTF-8 or
UTF-32. Reading block by block and hitting exactly a SMP
codepoint at the buffer limit, high surrogate at the end of the
first buffer, low surrogate at the start of the next. If you
don't think about it => 2 invalid characters instead of your nice
poop 💩 emoji character (emojis are in the SMP and they are more
and more frequent).
More information about the Digitalmars-d
mailing list