betterC becoming unusable
Paul Backus
snarwin at gmail.com
Mon Nov 7 10:58:44 UTC 2022
On Monday, 7 November 2022 at 09:50:47 UTC, rikki cattermole
wrote:
> But why on earth is UTF-8 strings specially cased and not
> UTF-16 or UTF-32?
>
> Thats not right...
>
> No sanitization, its basically just byte for byte comparison.
> Which is the wrong way to compare Unicode strings anyway (and
> requires the tables)!
https://dlang.org/spec/arrays.html#strings-unicode
> Note that built-in comparison operators operate on a code unit
> basis. The end result for valid strings is the same as that of
> code point for code point comparison as long as both strings
> are in the same normalization form. Since normalization is a
> costly operation not suitable for language primitives it's
> assumed to be enforced by the user.
Also, practically speaking, lots of D code uses `char[]`
interchangeably with `ubyte[]` and does zero UTF-8 validation,
especially code that interoperates with C libraries. So the idea
that "D strings are unicode" is little more than a polite fiction.
More information about the Digitalmars-d
mailing list