Why string alias is invariant ?
Frits van Bommel
fvbommel at REMwOVExCAPSs.nl
Thu Jan 31 16:03:14 PST 2008
Janice Caron wrote:
> It's not possible, even in principle, to lowercase a char[] in place,
> because a char[] by definition is an array of UTF-8 code units, /not/
> an array of characters. Lowercasing a character may result in the
> length of its UTF-8 sequence changing. If the length increases, you're
> screwed.
>
> You can lowercase a dchar[] in place, but not a char[].
I'm not sure if that's true[2].
However, I *am* sure it's *not* true for uppercasing. Some code points
expand to 2 or 3 codepoints when uppercased. One common case is U+00DF
"ß", LATIN SMALL LETTER SHARP S, which expands to "SS" (two characters)
when uppercased[1]. Another example from the Unicode standard, U+0390,
GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS apparently expands to
three codepoints.
[1] Interestingly though, the UTF-8 (aka char[]) representation is the
same length :P.
[2] The relevant section[3] of the Unicode standard says "Case mappings
may produce strings of different lengths than the original." but
proceeds to only give examples for uppercasing.
[3] Section 5.18, see
http://www.unicode.org/versions/Unicode5.0.0/ch05.pdf#G21180
More information about the Digitalmars-d
mailing list