UTF-8 issues

Chris R. Miller lordsauronthegreat at gmail.com
Mon Sep 15 11:38:34 PDT 2008


Eldar Insafutdinov wrote:
> I faced some issues with utf-8 support in D.
> As it stated in http://www.digitalmars.com/d/2.0/cppstrings.html strings support slicing and length-calculation. Since strings are char arrays this is correct only for latin strings. So when the strings for example cyrillic chars - length is wrong, indexing also doesn't work, and slicing too.
> But foreach works correctly. So utf-8 support is partial. Maybe there are functions from standart library that does this work? I checked D2 new features - there was not improving utf-8 support - am I wrong?

IIRC a char array in D will compress itself for ASCII-encodable
characters, which destroys the integrity of the length variable.  Well,
it's still valid in terms of how long in words the array is, but in
terms of real characters it's no longer valid.

If you used a wchar or dchar things would be different.



More information about the Digitalmars-d mailing list