How can I convert a file encode by CP936 to a file with UTF-8 encoding

rocex rocexwang at gmail.com
Wed Jul 13 13:13:03 UTC 2022


On Wednesday, 13 July 2022 at 12:00:43 UTC, Adam D Ruppe wrote:
> On Wednesday, 13 July 2022 at 11:47:56 UTC, rocex wrote:
>> How can I convert a file encode by CP936 to a file with UTF-8 
>> encoding
>
> My lib doesn't have it included but the basic idea is to take 
> this table:
>
> https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP936.TXT
>
> and do the conversions. So loop through it, if it is < 128, it 
> stays the same, if it == 128 it is 0x20AC, and greater than 
> that you need to read the second byte too and look it up in 
> that table.
>
> It looks like for many of the bytes, they increase in sequence, 
> so you might only need part of the actual lookup table, and the 
> rest you can do with some addition. Looks like from lead byte 
> 83 it is a.... almost sequential offset. Probably safest to 
> just copy the whole table.

I found this https://github.com/guotie/gogb2312, the algorithm 
should be the same


More information about the Digitalmars-d-learn mailing list