RFC: Case-Insensitive Strings (And usually they really do *have*case)
Nick Sabalausky
a at a.a
Mon Jan 10 14:00:40 PST 2011
"Daniel Gibson" <metalcaedes at gmail.com> wrote in message
news:igfttf$p46$1 at digitalmars.com...
> Am 10.01.2011 22:16, schrieb Michel Fortin:
>>
>> Comparing the lowercase version of two strings works well for ASCII, but
>> I doubt
>> it works very well for Unicode. Case conversion is not bidirectional (for
>> instance both 'SS' and 'ß' become 'ss' in lowercase in German),
>
> That's wrong, 'ß' is lowercase and no upper-case version is used really,
> though one exists in Unicode (see:
> http://en.wikipedia.org/wiki/Capital_%C3%9F ).
> Sometimes, when stuff is written in fullcaps, 'ß' (which never is the
> first character of a word) is replaced by "SS", but I wouldn't expect that
> to be equal on icmp(). (e.g. "Strings vergleichen macht keinen Spaß!" vs
> "STRINGS VERGLEICHEN MACHT KEINEN SPASS!")
>
> Anyway, in this case comparing in lowercase would cause no trouble at all
> (comparing in uppercase however would, if you don't use the
> not-really-existing-but-defined-by-unicode-Capital-ß).
>
> I don't know if there may be problems with special characters in other
> languages, though.
>
One of the unicode documents mentions an example involving the three greek
"sigma" letters, although I never quite understood how it demonstrated the
inadequacy of using lower-case:
http://www.unicode.org/reports/tr21/tr21-5.html#Caseless_Matching
...Which references some information near the end of this sub-section:
http://www.unicode.org/reports/tr21/tr21-5.html#Introduction
Actually, what probably should be stored is a *normalized* folding-case
version of the string, because then (if I understand correctly) memcmp could
be used. I don't think memcpy technically works on non-ASCII (unless it's in
normalized form).
In any case, Phobos doesn't currently handle any of that stuff at all, so my
case-insensitive string type wouldn't be taking things backwards in that
regard.
More information about the Digitalmars-d
mailing list