betterC becoming unusable

Mon Nov 7 12:46:08 UTC 2022

On Monday, 7 November 2022 at 10:58:44 UTC, Paul Backus wrote:
> On Monday, 7 November 2022 at 09:50:47 UTC, rikki cattermole 
> wrote:
>> But why on earth is UTF-8 strings specially cased and not 
>> UTF-16 or UTF-32?
>>
>> Thats not right...
>>
>> No sanitization, its basically just byte for byte comparison. 
>> Which is the wrong way to compare Unicode strings anyway (and 
>> requires the tables)!
>
> https://dlang.org/spec/arrays.html#strings-unicode
>
>> Note that built-in comparison operators operate on a code unit 
>> basis. The end result for valid strings is the same as that of 
>> code point for code point comparison as long as both strings 
>> are in the same normalization form. Since normalization is a 
>> costly operation not suitable for language primitives it's 
>> assumed to be enforced by the user.
>
> Also, practically speaking, lots of D code uses `char[]` 
> interchangeably with `ubyte[]` and does zero UTF-8 validation, 
> especially code that interoperates with C libraries. So the 
> idea that "D strings are unicode" is little more than a polite 
> fiction.

Doesn't mean that they are doing something right. They should 
just replace all char* with ubyte* in D for any C header they 
use, if they just simply cast them, then compiler could spot them 
when used with string only methods.

Best would be if such mixing between non utf and utf would not be 
so easy to do, and that would force user to think twice.