ANSI to UTF8 problem

Nick Sabalausky a at a.a
Mon Aug 16 18:58:21 PDT 2010


"jicman" <cabrera_ at _wrc.xerox.com> wrote in message 
news:i4cn8h$2vtn$1 at digitalmars.com...
>
> Greetings.
>
> I have this program,
>
> import std.stdio;
> import juno.base.text;
> import std.file;
> import std.windows.charset;
> import std.utf;
>
> int main(char[][] args)
> {
>  char[] ansi = r"c:\ansi.txt";
>  char[] utf8 = r"c:\utf8.txt";
>  try
>  {
>    char[] t = cast(char[]) read(ansi);
>    write(utf8, std.windows.charset.fromMBSz(t.ptr,0));
>    writefln(" converted to UTF8.");
>  }
>  catch (UtfException e)
>  {
>    writefln(" is not ANSI");
>    return 1;
>  }
>  return(0);
> }
>
> the ansi.txt file contains,
>
> josé
> áéíóúñÑ
>
> the utf8.txt file when opened with Wordpad looks like this:
>
> josé
> áéíóúñÑ
>
> The file did change from ANSI to UTF8, however, it display wrong with 
> Wordpad.  The problem is that there is one application that I am trying to 
> filled with these UTF8 files that is behaving or displaying the same 
> problem as Wordpad.
>
> Any help would be greatly appreciated.
>
> thanks,
>
> josé

The utf8.txt file is probably missing the UTF-8 BOM (I'm not familiar with 
fromMBSz: I *assume* it doesn't add the BOM, but maybe I'm wrong?). Without 
that BOM, Wordpad is probably assuming it's "ASCII with some codepage" 
instead of UTF8.

Open utf8.txt in a hex editor (I like XVI32). If it doesn't start with EF BB 
BF then that's probably the problem, and you'll need to change:

write(utf8, std.windows.charset.fromMBSz(t.ptr,0));

to:

write(utf8, x"EF BB BF" ~ std.windows.charset.fromMBSz(t.ptr,0));




More information about the Digitalmars-d-learn mailing list