How can I convert a file encode by CP936 to a file with UTF-8 encoding

Adam D Ruppe destructionator at gmail.com
Wed Jul 13 12:00:43 UTC 2022


On Wednesday, 13 July 2022 at 11:47:56 UTC, rocex wrote:
> How can I convert a file encode by CP936 to a file with UTF-8 
> encoding

My lib doesn't have it included but the basic idea is to take 
this table:

https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP936.TXT

and do the conversions. So loop through it, if it is < 128, it 
stays the same, if it == 128 it is 0x20AC, and greater than that 
you need to read the second byte too and look it up in that table.

It looks like for many of the bytes, they increase in sequence, 
so you might only need part of the actual lookup table, and the 
rest you can do with some addition. Looks like from lead byte 83 
it is a.... almost sequential offset. Probably safest to just 
copy the whole table.


More information about the Digitalmars-d-learn mailing list