UTF-16 endianess
Steven Schveighoffer via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Fri Jan 29 14:43:26 PST 2016
On 1/29/16 5:36 PM, Marek Janukowicz wrote:
> I have trouble understanding how endianess works for UTF-16.
>
> For example UTF-16 code for 'ł' character is 0x0142. But this program shows
> otherwise:
>
> import std.stdio;
>
> public void main () {
> ubyte[] properOrder = [0x01, 0x42];
> ubyte[] reverseOrder = [0x42, 0x01];
> writefln( "proper: %s, reverse: %s",
> cast(wchar[])properOrder,
> cast(wchar[])reverseOrder );
> }
>
> output:
>
> proper: 䈁, reverse: ł
>
> Is there anything I should know about UTF endianess?
It's not any different from other endianness.
In other words, a UTF16 code unit is expected to be in the endianness of
the platform you are running on.
If you are on x86 or x86_64 (very likely), then it should be little endian.
If your source of data is big-endian (or opposite from your native
endianness), it will have to be converted before treating as a wchar[].
Note the version identifiers BigEndian and LittleEndian can be used to
compile the correct code.
-Steve
More information about the Digitalmars-d-learn
mailing list