encoding ISO-8859-1 to UTF-8 in std.net.curl
ag0aep6g via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Mon Aug 8 14:11:26 PDT 2016
On 08/08/2016 09:57 PM, Alexsej wrote:
> // content in ISO-8859-1 to UTF-8 encoding but I lose
> //the Cyrillic "<?xml version='1.0'
> encoding='UTF-8'?>отсутствует или неверно задан параметр"
> // I get it "<?xml version='1.0'
> encoding='UTF-8'?>оÑÑÑÑÑÑвÑÐµÑ Ð¸Ð»Ð¸ невеÑно
> задан паÑамеÑÑ"
> // How do I change the encoding to UTF-8 in response
>
>
> string s = cast(immutable char[])content;
> auto f = File("output.txt","w"); // output.txt file in UTF-8;
> f.write(s);
The server doesn't include the encoding in the Content-Type header,
right? So curl assumes the default, which is ISO 8859-1. It interprets
the data as that and transcodes to UTF-8. The result is garbage, of course.
I don't see a way to change the default encoding. Maybe that should be
added.
Until then you can reverse the wrong transcoding:
----
import std.encoding: Latin1String, transcode;
Latin1String pseudo_latin1;
transcode(content.idup, pseudo_latin1);
string s = cast(string) pseudo_latin1;
----
Tiny rant:
Why on earth does transcode only accept immutable characters for input?
Every other post here uncovers some bug/shortcoming :(
More information about the Digitalmars-d-learn
mailing list