string types: const(char)[] and cstring
Frits van Bommel
fvbommel at REMwOVExCAPSs.nl
Tue May 29 13:22:47 PDT 2007
Regan Heath wrote:
> Aziz K. Wrote:
>> On Tue, 29 May 2007 20:41:31 +0200, Regan Heath <regan at netmail.co.nz>
>> wrote:
>>> and the result will be a correctly reversed UTF8 string. Or am I
>>> missing something?
>>>
>>> Regan Heath
>> I think your method doesn't take compound characters into account.
>>
>> For example:
>> // The accented é can be represented by a single code-point. But let's
>> assume it's a compound character (Ce`a).
>
> Is it a compound character in UTF32?
Unicode defines multiple valid encodings for lots of accented
characters; typically a single codepoint as well as separate codepoints
for the accent and the "naked" character that combine when put together.
>> writefln( toUTF8(toUTF32("Céa").reverse) ) // would reverse to a`eC
>> // This would print áeC
>
> Can you code that test up (using the \U character literal syntax so that the web interface doesn't mangle it) I'd like to play with it.
>
> My statement was based on the assumption that converting UTF8 to UTF32 would result in all the compound characters being converted/represented by a single UTF32 codepoint each and would therefore be reversable.
I don't think std.utf.toUTF* combine or split accented characters, I'm
pretty sure it just does codepoint representation conversions (keeping
the number of codepoints constant).
More information about the Digitalmars-d-announce
mailing list