[Issue 23186] wchar/dchar do not have their endianess defined

Wed Jun 15 21:44:37 UTC 2022

https://issues.dlang.org/show_bug.cgi?id=23186

--- Comment #4 from Richard Cattermole <alphaglosined at gmail.com> ---
(In reply to Dennis from comment #3)
> (In reply to Richard Cattermole from comment #2)
> > No, this isn't an ABI thing, it's about encodings.
> 
> I don't follow, do you have a reference for me? I'm looking at:
> 
> https://en.wikipedia.org/wiki/UTF-16
> 
> "Each Unicode code point is encoded either as one or two 16-bit code units.
> How these 16-bit codes are stored as bytes then depends on the 'endianness'
> of the text file or communication protocol."
> 
> The `wchar` type is an integer, the 16-bit code. No integral operations on a
> `wchar` reveal the endianness, only once you reinterpret cast 'the text
> file' (a `ubyte[]`) will endianness come up, but at that point I think it's
> no different than casting a `ubyte[]` to a `ushort[]`. We don't have BE and
> LE `short` types either.

Indeed. Integers you kinda expect that it is the same as cpu endian. But you
cannot assume the same for UTF (hence we should document it).

> > However, it can be kept pretty simple something like `Unicode 8-bit code
> > point with matching target endian`.
> 
> There's no endian difference for 8-bit code points, or are we talking about
> bit order instead of byte order?

That should have been UTF-16 or UTF-32, but its the same.

--