encoding ISO-8859-1 to UTF-8 in std.net.curl

ag0aep6g via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Mon Aug 8 14:11:26 PDT 2016


On 08/08/2016 09:57 PM, Alexsej wrote:
>     // content in ISO-8859-1 to UTF-8 encoding but I lose
>         //the Cyrillic "<?xml version='1.0'
> encoding='UTF-8'?>отсутствует или неверно задан параметр"
>     // I get it "<?xml version='1.0'
> encoding='UTF-8'?>отсутствует или неверно
> задан параметр"
>     // How do I change the encoding to UTF-8 in response
>
>
>     string s = cast(immutable char[])content;
>     auto f = File("output.txt","w");  // output.txt file in UTF-8;
>     f.write(s);

The server doesn't include the encoding in the Content-Type header, 
right? So curl assumes the default, which is ISO 8859-1. It interprets 
the data as that and transcodes to UTF-8. The result is garbage, of course.

I don't see a way to change the default encoding. Maybe that should be 
added.

Until then you can reverse the wrong transcoding:

----
import std.encoding: Latin1String, transcode;
Latin1String pseudo_latin1;
transcode(content.idup, pseudo_latin1);
string s = cast(string) pseudo_latin1;
----

Tiny rant:

Why on earth does transcode only accept immutable characters for input? 
Every other post here uncovers some bug/shortcoming :(


More information about the Digitalmars-d-learn mailing list