UTF-16 endianess
Steven Schveighoffer via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Fri Jan 29 15:58:17 PST 2016
On 1/29/16 6:03 PM, Marek Janukowicz wrote:
> On Fri, 29 Jan 2016 17:43:26 -0500, Steven Schveighoffer wrote:
>>> Is there anything I should know about UTF endianess?
>>
>> It's not any different from other endianness.
>>
>> In other words, a UTF16 code unit is expected to be in the endianness of
>> the platform you are running on.
>>
>> If you are on x86 or x86_64 (very likely), then it should be little endian.
>>
>> If your source of data is big-endian (or opposite from your native
>> endianness),
>
> To be precise - my case is IMAP UTF7 folder name encoding and I finally found
> out it's indeed big endian, which explains my problem (as I'm indeed on x86_64).
>
>> it will have to be converted before treating as a wchar[].
>
> Is there any clever way to do the conversion? Or do I need to swap the bytes
> manually?
No clever way, just the straightforward way ;)
Swapping endianness of 32-bits can be done with core.bitop.bswap. Doing
it with 16 bits I believe you have to do bit shifting. Something like:
foreach(ref elem; wcharArr) elem = ((elem << 8) & 0xff00) | ((elem >> 8)
& 0x00ff);
Or you can do it with the bytes directly before casting
>
>> Note the version identifiers BigEndian and LittleEndian can be used to
>> compile the correct code.
>
> This solution is of no use to me as I don't want to change the endianess in
> general.
What I mean is that you can annotate your code with version statements like:
version(LittleEndian)
{
// perform the byteswap
...
}
so your code is portable to BigEndian systems (where you would not want
to byte swap).
-Steve
More information about the Digitalmars-d-learn
mailing list