The D Programming Language Vision Document

Sun Jul 3 21:01:49 UTC 2022

On Sunday, 3 July 2022 at 20:28:18 UTC, rikki cattermole wrote:
> We only support UTF-16/UTF-32 for the target endian.
>
> Text input comes from many sources, stdin, files and say the 
> windowing system are three common sources that do not make any 
> such guarantees.

Well, then the application author will use an external Unicode 
library anyway. If you support UTF-16 or UTF-32 there might not 
be a BOM mark, so you might need to use heuristics to figure out 
the LE/LB endian issue.

For things like gzip, png, crypto and unicode there are most 
likely faster and better tested open source alternatives than a 
small community can come up with. Maybe just use out whatever 
Chromium or Clang uses?

What I never liked about C++ is the string mess: char, signed 
char, unsigned char, char8_t, char16_t, char32_t, wchar_t, 
string, wstring, u8string, u16string, u32string, pmr::string, 
pmr::wstring, pmr::u8string, pmr::u16string, pmr::u32string… And 
this doesn't even account for endianess!! This is what happens 
over time as new needs pops up. One of the best things about 
Python3 and JavaScript is that there is one commonly used string 
type that is well supported.

Having one common string representation is a good thing for API 
authors.

(But make sure to have a maintained binding to a versatile C 
unicode library.)