Character set conversions

Adam D. Ruppe destructionator at gmail.com
Mon May 30 13:40:04 PDT 2011


>  Fun fact about Excel generated CSV files: quite apart from encoding
> issues, the separator used between cells depends on the locale: for
> example, in English locales it uses a coma but in French locales it
> uses a semicolon...

Yeah, I've seen the semicolon in the wild before too, though I didn't
know it was a locale thing.

My program solves it by confirming with the user. When you upload a
file, it tries to parse it with a few different assumptions. The
one that looks best is presented back to the user. (Looks best means
it has headings that roughly match what we expect and number of
columns that's more or less consistent).

It does charset the same way, actually. First, guess UTF-8. If that
doesn't validate, assume it's Windows-1252 unless told otherwise.

The user then confirms the guesses and organizes the final data
import.


It's worked out pretty well so far aside from unsupported charsets;
the users seem to like it.


More information about the Digitalmars-d mailing list