UTF-16 endianess
Johannes Pfau via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Fri Jan 29 16:14:23 PST 2016
Am Fri, 29 Jan 2016 18:58:17 -0500
schrieb Steven Schveighoffer <schveiguy at yahoo.com>:
> On 1/29/16 6:03 PM, Marek Janukowicz wrote:
> > On Fri, 29 Jan 2016 17:43:26 -0500, Steven Schveighoffer wrote:
> >>> Is there anything I should know about UTF endianess?
> >>
> >> It's not any different from other endianness.
> >>
> >> In other words, a UTF16 code unit is expected to be in the
> >> endianness of the platform you are running on.
> >>
> >> If you are on x86 or x86_64 (very likely), then it should be
> >> little endian.
> >>
> >> If your source of data is big-endian (or opposite from your native
> >> endianness),
> >
> > To be precise - my case is IMAP UTF7 folder name encoding and I
> > finally found out it's indeed big endian, which explains my problem
> > (as I'm indeed on x86_64).
> >> it will have to be converted before treating as a wchar[].
> >
> > Is there any clever way to do the conversion? Or do I need to swap
> > the bytes manually?
>
> No clever way, just the straightforward way ;)
>
> Swapping endianness of 32-bits can be done with core.bitop.bswap.
> Doing it with 16 bits I believe you have to do bit shifting.
> Something like:
>
> foreach(ref elem; wcharArr) elem = ((elem << 8) & 0xff00) | ((elem >>
> 8) & 0x00ff);
>
> Or you can do it with the bytes directly before casting
There's also a phobos solution: bigEndianToNative in std.bitmanip.
More information about the Digitalmars-d-learn
mailing list