How can I convert a file encode by CP936 to a file with UTF-8 encoding
Adam D Ruppe
destructionator at gmail.com
Wed Jul 13 12:00:43 UTC 2022
On Wednesday, 13 July 2022 at 11:47:56 UTC, rocex wrote:
> How can I convert a file encode by CP936 to a file with UTF-8
> encoding
My lib doesn't have it included but the basic idea is to take
this table:
https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP936.TXT
and do the conversions. So loop through it, if it is < 128, it
stays the same, if it == 128 it is 0x20AC, and greater than that
you need to read the second byte too and look it up in that table.
It looks like for many of the bytes, they increase in sequence,
so you might only need part of the actual lookup table, and the
rest you can do with some addition. Looks like from lead byte 83
it is a.... almost sequential offset. Probably safest to just
copy the whole table.
More information about the Digitalmars-d-learn
mailing list