encoding ISO-8859-1 to UTF-8 in std.net.curl
Alexsej via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Mon Aug 8 15:05:08 PDT 2016
On Monday, 8 August 2016 at 21:11:26 UTC, ag0aep6g wrote:
> On 08/08/2016 09:57 PM, Alexsej wrote:
>> // content in ISO-8859-1 to UTF-8 encoding but I lose
>> //the Cyrillic "<?xml version='1.0'
>> encoding='UTF-8'?>отсутствует или неверно задан параметр"
>> // I get it "<?xml version='1.0'
>> encoding='UTF-8'?>оÑÑÑÑÑÑвÑÐµÑ Ð¸Ð»Ð¸ невеÑно
>> задан паÑамеÑÑ"
>> // How do I change the encoding to UTF-8 in response
>>
>>
>> string s = cast(immutable char[])content;
>> auto f = File("output.txt","w"); // output.txt file in
>> UTF-8;
>> f.write(s);
>
> The server doesn't include the encoding in the Content-Type
> header, right? So curl assumes the default, which is ISO
> 8859-1. It interprets the data as that and transcodes to UTF-8.
> The result is garbage, of course.
>
> I don't see a way to change the default encoding. Maybe that
> should be added.
>
> Until then you can reverse the wrong transcoding:
>
> ----
> import std.encoding: Latin1String, transcode;
> Latin1String pseudo_latin1;
> transcode(content.idup, pseudo_latin1);
> string s = cast(string) pseudo_latin1;
> ----
>
> Tiny rant:
>
> Why on earth does transcode only accept immutable characters
> for input? Every other post here uncovers some bug/shortcoming
> :(
//header from server
server: nginx
date: Mon, 08 Aug 2016 22:02:15 GMT
content-type: text/xml; Charset=utf-8
content-length: 204
connection: keep-alive
vary: Accept-Encoding
cache-control: private
expires: Mon, 08 Aug 2016 22:02:15 GMT
set-cookie: ASPSESSIONIDSSCCDASA=KIAPMCMDMPEDHPBJNMGFHMEB; path=/
x-powered-by: ASP.NET
More information about the Digitalmars-d-learn
mailing list