Working with utf

Thu Jun 14 06:04:43 PDT 2007

On Thu, 14 Jun 2007 14:40:02 +0200, Simen Haugen wrote:

> I hate it!
> 
> Say we have a string "øl". When I read this from a text file, it is two 
> chars, but since this is no utf8 string, I have to convert it to utf8 before 
> I can do any string operations on it.
> I can easily live with that. Say we have a file with several lines, and its 
> important that all lines are of equal length.
> The string "ol" is two chars, but the string "øl" is 3 chars in utf8. 
> Because of this I have to convert it back to latin-1 before checking 
> lengths. The same applies to slicing, but even worse.
> For all I care, "ø" is one character, not two. If I slice "ø" to get the 
> first character, I only get the first half of the character. Isn't it more 
> obvious that all string manipulation works with all utf8 characters as one 
> character instead of two for values greater than 127?
> 
> I cannot find any nice solutions for this, and have to convert to and from 
> latin-1/utf8 all the time.
> 
> There must be a better way...

Convert to utf32 (dchar[]) then do your stuff and convert back to latin-1
when you're done. Each dchar[] element is a single character. 

-- 
Derek Parnell
Melbourne, Australia
"Justice for David Hicks!"
skype: derek.j.parnell