UTF-8 problems

Mon Jun 12 10:31:33 PDT 2006

Deewiant escribió:
> 
> Thanks for the explanation. Unfortunately, I'm not knowledgeable enough in these
> matters to correct the problem.
> 
> So, for instance, "c3 a4" is the UTF-8 equivalent of U+00E4, "ä". How do I
> combine the former two into a single "char"?
> 
> Say I check if the char received from getc() is greater than 127 (outside ASCII)
> and if it is, I store it and the following char in two ubytes. Now what? How do
> I get a char?

Keep using readLine. The entire line should be made of valid UTF8 characters.

Maybe something to do about it would be to add getUTF8char, getUTF16char and 
getUTF32char, which would return char[], wchar[] and dchar, respectively, the 
first one returning an array of 1 to 4 elements, and the second 1 or 2.

-- 
Carlos Santander Bernal