Character set conversions

Jacob Carlborg doob at me.com
Mon May 30 23:50:15 PDT 2011


On 2011-05-30 19:57, "Jérôme M. Berger" wrote:
> Adam D. Ruppe wrote:
>> Kagamin wrote:
>>> May be, it's his cgi lib? :)
>>> Client is free to send requests in any encoding, I suppose.
>>
>> In practice, that hasn't been a problem because browser tend to
>> send requests in the same encoding as the html you served.
>>
>> Since the D always outputs utf8, the browsers all send back utf8
>> too.
>>
>>
>> The first problem I had was users can upload csv files, which they
>> generally make in Excel... which apparently outputs Windows-1252.
>> Fine for 99% of text, but then someone puts in a curly quote or
>> an em dash and it throws an invalid utf 8 sequence.
>>
>> Converting that is easy enough though.
>>
> 	Fun fact about Excel generated CSV files: quite apart from encoding
> issues, the separator used between cells depends on the locale: for
> example, in English locales it uses a coma but in French locales it
> uses a semicolon...
>
> 	Just thought I'd point it out in case you did not know.
>
> 		Jerome

Yeah, that is a nightmare. I tried SYLK, symbolic link as well, it's 
something like CSV but more advanced, didn't work out that well either. 
I ended up using real Excel documents with the help of the rubygem 
"spreadsheet".

-- 
/Jacob Carlborg


More information about the Digitalmars-d mailing list