Character set conversions
Adam D. Ruppe
destructionator at gmail.com
Mon May 30 07:13:27 PDT 2011
Kagamin wrote:
> May be, it's his cgi lib? :)
> Client is free to send requests in any encoding, I suppose.
In practice, that hasn't been a problem because browser tend to
send requests in the same encoding as the html you served.
Since the D always outputs utf8, the browsers all send back utf8
too.
The first problem I had was users can upload csv files, which they
generally make in Excel... which apparently outputs Windows-1252.
Fine for 99% of text, but then someone puts in a curly quote or
an em dash and it throws an invalid utf 8 sequence.
Converting that is easy enough though.
Second problem is now I want to fetch and process random websites
on the internet, and they come in a variety of encodings... again,
utf covers a big majority, but not all of them.
More information about the Digitalmars-d
mailing list