Why string alias is invariant ?

Frits van Bommel fvbommel at REMwOVExCAPSs.nl
Thu Jan 31 16:03:14 PST 2008


Janice Caron wrote:
> It's not possible, even in principle, to lowercase a char[] in place,
> because a char[] by definition is an array of UTF-8 code units, /not/
> an array of characters. Lowercasing a character may result in the
> length of its UTF-8 sequence changing. If the length increases, you're
> screwed.
> 
> You can lowercase a dchar[] in place, but not a char[].

I'm not sure if that's true[2].
However, I *am* sure it's *not* true for uppercasing. Some code points 
expand to 2 or 3 codepoints when uppercased. One common case is U+00DF 
"ß", LATIN SMALL LETTER SHARP S, which expands to "SS" (two characters) 
when uppercased[1]. Another example from the Unicode standard, U+0390, 
GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS apparently expands to 
three codepoints.


[1] Interestingly though, the UTF-8 (aka char[]) representation is the 
same length :P.

[2] The relevant section[3] of the Unicode standard says "Case mappings 
may produce strings of different lengths than the original." but 
proceeds to only give examples for uppercasing.

[3] Section 5.18, see 
http://www.unicode.org/versions/Unicode5.0.0/ch05.pdf#G21180



More information about the Digitalmars-d mailing list