reading an unicode file

Bill Baxter dnewsgroup at billbaxter.com
Thu May 10 21:57:54 PDT 2007


jicman wrote:
> Greetings!
> 
> I am reading this file into a char[][] array and all the data is broken down
> by a space.  So, if a line of data read has,
> 
> hi there folks!
> 
> the string contains,
> 
> h i  t h e r e  f o l k s !
> 
> I know this has to do with UTF8 and unicode, but how do I fix that?

Yeh, the file is probably UCS2 (UTF16) rather than UTF8.  Meaning every 
char is 2 bytes (with a few exceptions).  The things between the 
characters are probably not spaces, but rather null characters (a 0-byte).

> Any help would be greatly appreciated.

Try to read it as binary and use std.utf functions to convert?
Or maybe read as wchar's with the funcs in std.stream (then convert to 
utf8 if neceesary with std.utf funcs).

Never done this stuff myself, but that's where I'd look.

--bb


More information about the Digitalmars-d-learn mailing list