Working with utf

Simen Haugen simen at norstat.no
Thu Jun 14 05:40:02 PDT 2007


I hate it!

Say we have a string "øl". When I read this from a text file, it is two 
chars, but since this is no utf8 string, I have to convert it to utf8 before 
I can do any string operations on it.
I can easily live with that. Say we have a file with several lines, and its 
important that all lines are of equal length.
The string "ol" is two chars, but the string "øl" is 3 chars in utf8. 
Because of this I have to convert it back to latin-1 before checking 
lengths. The same applies to slicing, but even worse.
For all I care, "ø" is one character, not two. If I slice "ø" to get the 
first character, I only get the first half of the character. Isn't it more 
obvious that all string manipulation works with all utf8 characters as one 
character instead of two for values greater than 127?

I cannot find any nice solutions for this, and have to convert to and from 
latin-1/utf8 all the time.

There must be a better way...





More information about the Digitalmars-d mailing list