UTF-16 endianess

Johannes Pfau via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Fri Jan 29 16:14:23 PST 2016


Am Fri, 29 Jan 2016 18:58:17 -0500
schrieb Steven Schveighoffer <schveiguy at yahoo.com>:

> On 1/29/16 6:03 PM, Marek Janukowicz wrote:
> > On Fri, 29 Jan 2016 17:43:26 -0500, Steven Schveighoffer wrote:  
> >>> Is there anything I should know about UTF endianess?  
> >>
> >> It's not any different from other endianness.
> >>
> >> In other words, a UTF16 code unit is expected to be in the
> >> endianness of the platform you are running on.
> >>
> >> If you are on x86 or x86_64 (very likely), then it should be
> >> little endian.
> >>
> >> If your source of data is big-endian (or opposite from your native
> >> endianness),  
> >
> > To be precise - my case is IMAP UTF7 folder name encoding and I
> > finally found out it's indeed big endian, which explains my problem
> > (as I'm indeed on x86_64). 
> >> it will have to be converted before treating as a wchar[].  
> >
> > Is there any clever way to do the conversion? Or do I need to swap
> > the bytes manually?  
> 
> No clever way, just the straightforward way ;)
> 
> Swapping endianness of 32-bits can be done with core.bitop.bswap.
> Doing it with 16 bits I believe you have to do bit shifting.
> Something like:
> 
> foreach(ref elem; wcharArr) elem = ((elem << 8) & 0xff00) | ((elem >>
> 8) & 0x00ff);
> 
> Or you can do it with the bytes directly before casting


There's also a phobos solution: bigEndianToNative in std.bitmanip.



More information about the Digitalmars-d-learn mailing list