ANSI to UTF8 problem
Nick Sabalausky
a at a.a
Mon Aug 16 18:58:21 PDT 2010
"jicman" <cabrera_ at _wrc.xerox.com> wrote in message
news:i4cn8h$2vtn$1 at digitalmars.com...
>
> Greetings.
>
> I have this program,
>
> import std.stdio;
> import juno.base.text;
> import std.file;
> import std.windows.charset;
> import std.utf;
>
> int main(char[][] args)
> {
> char[] ansi = r"c:\ansi.txt";
> char[] utf8 = r"c:\utf8.txt";
> try
> {
> char[] t = cast(char[]) read(ansi);
> write(utf8, std.windows.charset.fromMBSz(t.ptr,0));
> writefln(" converted to UTF8.");
> }
> catch (UtfException e)
> {
> writefln(" is not ANSI");
> return 1;
> }
> return(0);
> }
>
> the ansi.txt file contains,
>
> josé
> áéíóúñÑ
>
> the utf8.txt file when opened with Wordpad looks like this:
>
> josé
> áéÃóúñÃ
>
> The file did change from ANSI to UTF8, however, it display wrong with
> Wordpad. The problem is that there is one application that I am trying to
> filled with these UTF8 files that is behaving or displaying the same
> problem as Wordpad.
>
> Any help would be greatly appreciated.
>
> thanks,
>
> josé
The utf8.txt file is probably missing the UTF-8 BOM (I'm not familiar with
fromMBSz: I *assume* it doesn't add the BOM, but maybe I'm wrong?). Without
that BOM, Wordpad is probably assuming it's "ASCII with some codepage"
instead of UTF8.
Open utf8.txt in a hex editor (I like XVI32). If it doesn't start with EF BB
BF then that's probably the problem, and you'll need to change:
write(utf8, std.windows.charset.fromMBSz(t.ptr,0));
to:
write(utf8, x"EF BB BF" ~ std.windows.charset.fromMBSz(t.ptr,0));
More information about the Digitalmars-d-learn
mailing list