ANSI to UTF8 problem
jicman
cabrera_ at _wrc.xerox.com
Mon Aug 16 19:24:42 PDT 2010
Nick Sabalausky Wrote:
> "jicman" <cabrera_ at _wrc.xerox.com> wrote in message
> news:i4cn8h$2vtn$1 at digitalmars.com...
> >
> > Greetings.
> >
> > I have this program,
> >
> > import std.stdio;
> > import juno.base.text;
> > import std.file;
> > import std.windows.charset;
> > import std.utf;
> >
> > int main(char[][] args)
> > {
> > char[] ansi = r"c:\ansi.txt";
> > char[] utf8 = r"c:\utf8.txt";
> > try
> > {
> > char[] t = cast(char[]) read(ansi);
> > write(utf8, std.windows.charset.fromMBSz(t.ptr,0));
> > writefln(" converted to UTF8.");
> > }
> > catch (UtfException e)
> > {
> > writefln(" is not ANSI");
> > return 1;
> > }
> > return(0);
> > }
> >
> > the ansi.txt file contains,
> >
> > josé
> > áéíóúñÑ
> >
> > the utf8.txt file when opened with Wordpad looks like this:
> >
> > josé
> > áéÃóúñÃ
> >
> > The file did change from ANSI to UTF8, however, it display wrong with
> > Wordpad. The problem is that there is one application that I am trying to
> > filled with these UTF8 files that is behaving or displaying the same
> > problem as Wordpad.
> >
> > Any help would be greatly appreciated.
> >
> > thanks,
> >
> > josé
>
> The utf8.txt file is probably missing the UTF-8 BOM (I'm not familiar with
> fromMBSz: I *assume* it doesn't add the BOM, but maybe I'm wrong?). Without
> that BOM, Wordpad is probably assuming it's "ASCII with some codepage"
> instead of UTF8.
>
> Open utf8.txt in a hex editor (I like XVI32). If it doesn't start with EF BB
> BF then that's probably the problem, and you'll need to change:
>
> write(utf8, std.windows.charset.fromMBSz(t.ptr,0));
>
> to:
>
> write(utf8, x"EF BB BF" ~ std.windows.charset.fromMBSz(t.ptr,0));
>
>
DOH! Yep! Thanks, Nick.
josé
More information about the Digitalmars-d-learn
mailing list