[Issue 23186] wchar/dchar do not have their endianess defined
d-bugmail at puremagic.com
d-bugmail at puremagic.com
Wed Jun 15 21:44:37 UTC 2022
https://issues.dlang.org/show_bug.cgi?id=23186
--- Comment #4 from Richard Cattermole <alphaglosined at gmail.com> ---
(In reply to Dennis from comment #3)
> (In reply to Richard Cattermole from comment #2)
> > No, this isn't an ABI thing, it's about encodings.
>
> I don't follow, do you have a reference for me? I'm looking at:
>
> https://en.wikipedia.org/wiki/UTF-16
>
> "Each Unicode code point is encoded either as one or two 16-bit code units.
> How these 16-bit codes are stored as bytes then depends on the 'endianness'
> of the text file or communication protocol."
>
> The `wchar` type is an integer, the 16-bit code. No integral operations on a
> `wchar` reveal the endianness, only once you reinterpret cast 'the text
> file' (a `ubyte[]`) will endianness come up, but at that point I think it's
> no different than casting a `ubyte[]` to a `ushort[]`. We don't have BE and
> LE `short` types either.
Indeed. Integers you kinda expect that it is the same as cpu endian. But you
cannot assume the same for UTF (hence we should document it).
> > However, it can be kept pretty simple something like `Unicode 8-bit code
> > point with matching target endian`.
>
> There's no endian difference for 8-bit code points, or are we talking about
> bit order instead of byte order?
That should have been UTF-16 or UTF-32, but its the same.
--
More information about the Digitalmars-d-bugs
mailing list