std.stringbuffer

Frits van Bommel fvbommel at REMwOVExCAPSs.nl
Wed Apr 30 16:00:24 PDT 2008


Janice Caron wrote:
> 2008/4/30 Me Here <p9e883002 at sneakemail.com>:
>>     char[] a = ...2000 chars from somewhere.
>>
>>     char[] field1 = a[ 312 .. 357 ];
>>     field1.toUpper();
> 
> I've kind of lost track of the number of times I've said this in
> recent days, but...
> 
> You cannot uppercase in place, because for any given dchar, c, the
> number of UTF-8 bytes required to express c may be different from the
> number of UTF-8 bytes required to express toupper(c).
> 
> If any of you have plans to uppercase or lowercase UTF-8 in place,
> forget that now. It just ain't possible. (You can uppercase ASCII,
> UTF-16, or UTF-32 in place. But not UTF-8, and char[], by definition,
> is UTF-8).

Actually, you can't uppercase UTF-16 and UTF-32 in-place either if you 
want to be entirely correct. For example: \u00df ("ß") --> \u0053 \u0053 
("SS"). This increases the byte count for both UTF-16 and UTF-32.
(This does work for UTF-8 though, since \u00df happens to require 2 
UTF-8 code units, and both \u0053s only one each)

(See <http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt> for what 
should be a complete list of characters with similar annoying casing 
properties)



More information about the Digitalmars-d mailing list