ANSI to UTF8 problem

jicman cabrera_ at _wrc.xerox.com
Mon Aug 16 19:24:42 PDT 2010


Nick Sabalausky Wrote:

> "jicman" <cabrera_ at _wrc.xerox.com> wrote in message 
> news:i4cn8h$2vtn$1 at digitalmars.com...
> >
> > Greetings.
> >
> > I have this program,
> >
> > import std.stdio;
> > import juno.base.text;
> > import std.file;
> > import std.windows.charset;
> > import std.utf;
> >
> > int main(char[][] args)
> > {
> >  char[] ansi = r"c:\ansi.txt";
> >  char[] utf8 = r"c:\utf8.txt";
> >  try
> >  {
> >    char[] t = cast(char[]) read(ansi);
> >    write(utf8, std.windows.charset.fromMBSz(t.ptr,0));
> >    writefln(" converted to UTF8.");
> >  }
> >  catch (UtfException e)
> >  {
> >    writefln(" is not ANSI");
> >    return 1;
> >  }
> >  return(0);
> > }
> >
> > the ansi.txt file contains,
> >
> > josé
> > áéíóúñÑ
> >
> > the utf8.txt file when opened with Wordpad looks like this:
> >
> > josé
> > áéíóúñÑ
> >
> > The file did change from ANSI to UTF8, however, it display wrong with 
> > Wordpad.  The problem is that there is one application that I am trying to 
> > filled with these UTF8 files that is behaving or displaying the same 
> > problem as Wordpad.
> >
> > Any help would be greatly appreciated.
> >
> > thanks,
> >
> > josé
> 
> The utf8.txt file is probably missing the UTF-8 BOM (I'm not familiar with 
> fromMBSz: I *assume* it doesn't add the BOM, but maybe I'm wrong?). Without 
> that BOM, Wordpad is probably assuming it's "ASCII with some codepage" 
> instead of UTF8.
> 
> Open utf8.txt in a hex editor (I like XVI32). If it doesn't start with EF BB 
> BF then that's probably the problem, and you'll need to change:
> 
> write(utf8, std.windows.charset.fromMBSz(t.ptr,0));
> 
> to:
> 
> write(utf8, x"EF BB BF" ~ std.windows.charset.fromMBSz(t.ptr,0));
> 
> 
DOH!  Yep!  Thanks, Nick.

josé


More information about the Digitalmars-d-learn mailing list