How can I convert a file encode by CP936 to a file with UTF-8 encoding
rocex
rocexwang at gmail.com
Wed Jul 13 13:13:03 UTC 2022
On Wednesday, 13 July 2022 at 12:00:43 UTC, Adam D Ruppe wrote:
> On Wednesday, 13 July 2022 at 11:47:56 UTC, rocex wrote:
>> How can I convert a file encode by CP936 to a file with UTF-8
>> encoding
>
> My lib doesn't have it included but the basic idea is to take
> this table:
>
> https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP936.TXT
>
> and do the conversions. So loop through it, if it is < 128, it
> stays the same, if it == 128 it is 0x20AC, and greater than
> that you need to read the second byte too and look it up in
> that table.
>
> It looks like for many of the bytes, they increase in sequence,
> so you might only need part of the actual lookup table, and the
> rest you can do with some addition. Looks like from lead byte
> 83 it is a.... almost sequential offset. Probably safest to
> just copy the whole table.
I found this https://github.com/guotie/gogb2312, the algorithm
should be the same
More information about the Digitalmars-d-learn
mailing list