Reading and writing Unicode files

jicman cabrera_ at _wrc.xerox.com
Sat Feb 28 20:34:40 PST 2009


Jarrett Billingsley Wrote:

> On Sat, Feb 28, 2009 at 1:40 AM, jicman wrote:
> >
> > Greetings.
> >
> > Sorry guys, please be patient with me.  I am having a hard time understanding this Unicode, ANSI, UTF* ideas.  I know how to get an UTF8 File and turn it into ANSI. and I know how to take a ANSI file and turn it into an UTF file.  But, now I have a Unicode file and I need to change the content and create a new Unicode file with the changes in the content.  I have read all kind of places, and I found mtext, from Chris Miller's site, by reading,
> >
> > http://www.prowiki.org/wiki4d/wiki.cgi?DanielKeep/TextInD
> >
> > Anyway, what I need is to read an Unicode file, search the strings inside, make changes to the file and write the changes back to an Unicode file.
> 
> You seem to be distinguishing between UTF and Unicode; it's kind of
> apples to oranges.  Unicode is a standard for character encoding (a
> mapping from numbers to characters, like ASCII).  UTF is a way - or
> rather, _several_ ways - of encoding Unicode text.  There are three
> major encodings, UTF-8, UTF-16, and UTF-32 (and the 16- and 32-bit
> encodings have both little- and big-endian versions), which correspond
> to D's char[], wchar[], and dchar[].
> 
> When you say a "Unicode" file do you mean it's encoded in UTF-16?  If
> so, you can just read the file's contents as a wchar[].  If you're
> using Phobos, keep in mind that it provides no functionality for
> searching or manipulating wchar[]s, which means you'll have to convert
> it to UTF-8 (char[]).  If you're using Tango, you can give
> tango.io.UnicodeFile a shot - it will automatically transcode a file
> from any Unicode encoding to any other, and if your file has a BOM, it
> can even automatically detect which encoding it's in.

Ok, the only reason that I say Unicode is that when I open the file in Notepad and I do a SaveAs, the Encoding says Unicode.  So, when i read this file and I write it back to the another file, the Encoding turns to UTF8.  I want to keep it as Unicode.

I will give the suggestion a try.  I did not try it yet.  Maybe Phobos should think about taking care of the BOM byte and provide support for these encodings.  I am a big fan of Phobos. :-)  I have not tried Tango yet, because I would have to uninstall Phobos and I have just spend two years using Phobos and we already have an application based in Phobos and changing back to Tango will slow us down and put us back.  Maybe version 2.0.

Thanks, Jarrett.

josé


More information about the Digitalmars-d-learn mailing list