Transparent ANSI to UTF-8 conversion
monarch_dodra
monarchdodra at gmail.com
Wed Feb 27 04:20:50 PST 2013
On Wednesday, 27 February 2013 at 10:56:16 UTC, Lubos Pintes
wrote:
> Hi,
> I would like to transparently convert from ANSI to UTF-8 when
> dealing with text files. For example here in Slovakia,
> virtually every text file is in Windows-1250.
> If someone opens a text file, he or she expects that it will
> work properly. So I suppose, that it is not feasible to tell
> someone "if you want to use my program, please convert every
> text to UTF-8".
>
> To obtain the mapping from ANSI to Unicode for particular code
> page is trivial. Maybe even MultibyteToWidechar could help with
> this.
>
> I however need to know how to do it "D-way". Could I define
> something like TextReader class? Or perhaps some support
> already exists somewhere?
> Thank
I'd say the D way would be to simply exploit the fact that UTF is
built into the language, and as such, not worry about encoding,
and use raw code points.
You get you "Codepage to unicode *codepoint*" table, and then you
simply map each character to a dchar. From there, D will itself
convert your raw unicode (aka UTF-32) to UTF8 on the fly, when
you need it. For example, writing to a file will automatically
convert input to UTF-8. You can also simply use
std.conv.to!string to convert any UTF scheme to UTF-8 (or any
other UTF too for that matter).
This may not be as efficient as a "true" "codepage to UTF8 table"
but:
1) Given you'll most probably be IO bound anyways, who cares?
2) Scalability. D does everything but the code page to code point
mapping. Why bother doing any more than that?
More information about the Digitalmars-d-learn
mailing list