The D Programming Language Vision Document

Wed Jul 6 21:30:44 UTC 2022

On Sunday, 3 July 2022 at 20:16:35 UTC, Ola Fosheim Grøstad wrote:
> On Sunday, 3 July 2022 at 19:32:56 UTC, rikki cattermole wrote:
>> It is required for string equivalent comparisons (which is 
>> what you should be doing in a LOT more cases! Anything user 
>> provided when compared should be normalized first.
>
> Well, I think it is reasonable for a protocol to require that 
> the input is NFC, and just check it and reject it or call out 
> to an external library to convert it into NFC.
>
> Anyway, UTF-8 is the only format that isn't affected by network 
> byte order… So if you support more than UTF-8 then you have to 
> support UTF-8, UTF16-LE, UTF16-BE, UTF-32LE, UTF-32BE…

It is pretty easy to convert those to native endian and back with 
functions in `std.bitmanip`. I recently did so to have a program 
to recognise files in all of those five.

Also the Phobos functions are of high quality. They work 
extremely well with the range API (other than having to live with 
autodecoding), they are well documented and they are 
comprehensive enough for almost any task. I don't recall having 
ever considered another library for handling Unicode.

And I think there is still pretty much value in handling UTF-16 
strings because that's what many other languages use. With the 
current vision, Phobos V2 won't handle UTF16 in place. We'll have 
to convert it to UTF8 before manipulation, which is probably not 
optimal. And if the string functions have to deal with two 
formats anyway, also supporting UTF32 on top of them probably 
does not make much difference.

That said, I don't feel strongly about this because if we kick 
UTF16 and UTF32 functions out of Phobos, they still are 
presumably available in Undead.