Reading and writing Unicode files

Jarrett Billingsley jarrett.billingsley at gmail.com
Sat Feb 28 08:07:34 PST 2009


On Sat, Feb 28, 2009 at 1:40 AM, jicman <cabrera_ at _wrc.xerox.com> wrote:
>
> Greetings.
>
> Sorry guys, please be patient with me.  I am having a hard time understanding this Unicode, ANSI, UTF* ideas.  I know how to get an UTF8 File and turn it into ANSI. and I know how to take a ANSI file and turn it into an UTF file.  But, now I have a Unicode file and I need to change the content and create a new Unicode file with the changes in the content.  I have read all kind of places, and I found mtext, from Chris Miller's site, by reading,
>
> http://www.prowiki.org/wiki4d/wiki.cgi?DanielKeep/TextInD
>
> Anyway, what I need is to read an Unicode file, search the strings inside, make changes to the file and write the changes back to an Unicode file.

You seem to be distinguishing between UTF and Unicode; it's kind of
apples to oranges.  Unicode is a standard for character encoding (a
mapping from numbers to characters, like ASCII).  UTF is a way - or
rather, _several_ ways - of encoding Unicode text.  There are three
major encodings, UTF-8, UTF-16, and UTF-32 (and the 16- and 32-bit
encodings have both little- and big-endian versions), which correspond
to D's char[], wchar[], and dchar[].

When you say a "Unicode" file do you mean it's encoded in UTF-16?  If
so, you can just read the file's contents as a wchar[].  If you're
using Phobos, keep in mind that it provides no functionality for
searching or manipulating wchar[]s, which means you'll have to convert
it to UTF-8 (char[]).  If you're using Tango, you can give
tango.io.UnicodeFile a shot - it will automatically transcode a file
from any Unicode encoding to any other, and if your file has a BOM, it
can even automatically detect which encoding it's in.


More information about the Digitalmars-d-learn mailing list