Character recognition and output

Hasan Aljudy hasan.aljudy at gmail.com
Mon Nov 6 00:33:48 PST 2006



Tyro wrote:
> Wondering if someone can point me in the right direction on small
> problem.
> 
> I'm attempting to parse(?) a file with the following
> string "�������������" embeded somewhere in it. When I try to
> output the information, however, writef() chokes if it comes across
> one of these characters. I thought that this was simply a writef
> [doFormat] problem so I tried to read the file using Christopher
> Miller's sample richtext viewer that accompanies DFL and the same
> thing happens (Error: 4invalid UTF-8 sequence). I tried different
> combinations of wchar[], dchar[], and byte[] but to no avail. How
> do I fix this?
> 
> import std.stdio: emitln = writefln, emit = writef;
> import std.file: exists, read;
> 
> void main (char[][] args)
> {
>   if (args.length == 2 && args[1].exists())
>   {
>     char[] file = cast(char[])args[1].read();
>     foreach(sizendx, char ch; file)
>     {
>       try { emit(ch); }             // terminates on �
>       catch { emit(" ");continue; }
>     }
>   }
>   else
>     emit ("usage is: ids filename");
> }
> 
> Andrew Edwards

Seems to me an encoding problem.
Even my mozilla Thunderbird client doesn't recognize the characters, it 
prints little diamonds with a question mark inside (the encoding is set 
to UTF-8).

I think the standard library is written to deal mainly with unicode text 
only.

If it's just one file (or a couple of them) the easiest way to 
trans-code it is probably to just open it with notepad then save it 
again with UTF-8 encoding.



More information about the Digitalmars-d-learn mailing list