string types: const(char)[] and cstring

Frits van Bommel fvbommel at REMwOVExCAPSs.nl
Tue May 29 13:22:47 PDT 2007


Regan Heath wrote:
> Aziz K. Wrote:
>> On Tue, 29 May 2007 20:41:31 +0200, Regan Heath <regan at netmail.co.nz>  
>> wrote:
>>> and the result will be a correctly reversed UTF8 string.  Or am I  
>>> missing something?
>>>
>>> Regan Heath
>> I think your method doesn't take compound characters into account.
>>
>> For example:
>> // The accented é can be represented by a single code-point. But let's  
>> assume it's a compound character (Ce`a).
> 
> Is it a compound character in UTF32?

Unicode defines multiple valid encodings for lots of accented 
characters; typically a single codepoint as well as separate codepoints 
for the accent and the "naked" character that combine when put together.

>> writefln( toUTF8(toUTF32("Céa").reverse) ) // would reverse to a`eC
>> // This would print áeC
> 
> Can you code that test up (using the \U character literal syntax so that the web interface doesn't mangle it) I'd like to play with it.
> 
> My statement was based on the assumption that converting UTF8 to UTF32 would result in all the compound characters being converted/represented by a single UTF32 codepoint each and would therefore be reversable.

I don't think std.utf.toUTF* combine or split accented characters, I'm 
pretty sure it just does codepoint representation conversions (keeping 
the number of codepoints constant).



More information about the Digitalmars-d-announce mailing list