betterC becoming unusable

Paul Backus snarwin at gmail.com
Mon Nov 7 10:58:44 UTC 2022


On Monday, 7 November 2022 at 09:50:47 UTC, rikki cattermole 
wrote:
> But why on earth is UTF-8 strings specially cased and not 
> UTF-16 or UTF-32?
>
> Thats not right...
>
> No sanitization, its basically just byte for byte comparison. 
> Which is the wrong way to compare Unicode strings anyway (and 
> requires the tables)!

https://dlang.org/spec/arrays.html#strings-unicode

> Note that built-in comparison operators operate on a code unit 
> basis. The end result for valid strings is the same as that of 
> code point for code point comparison as long as both strings 
> are in the same normalization form. Since normalization is a 
> costly operation not suitable for language primitives it's 
> assumed to be enforced by the user.

Also, practically speaking, lots of D code uses `char[]` 
interchangeably with `ubyte[]` and does zero UTF-8 validation, 
especially code that interoperates with C libraries. So the idea 
that "D strings are unicode" is little more than a polite fiction.


More information about the Digitalmars-d mailing list