RFC: Case-Insensitive Strings (And usually they really do *have*case)

Nick Sabalausky a at a.a
Mon Jan 10 14:00:40 PST 2011


"Daniel Gibson" <metalcaedes at gmail.com> wrote in message 
news:igfttf$p46$1 at digitalmars.com...
> Am 10.01.2011 22:16, schrieb Michel Fortin:
>>
>> Comparing the lowercase version of two strings works well for ASCII, but 
>> I doubt
>> it works very well for Unicode. Case conversion is not bidirectional (for
>> instance both 'SS' and 'ß' become 'ss' in lowercase in German),
>
> That's wrong, 'ß' is lowercase and no upper-case version is used really, 
> though one exists in Unicode (see: 
> http://en.wikipedia.org/wiki/Capital_%C3%9F ).
> Sometimes, when stuff is written in fullcaps, 'ß' (which never is the 
> first character of a word) is replaced by "SS", but I wouldn't expect that 
> to be equal on icmp(). (e.g. "Strings vergleichen macht keinen Spaß!" vs 
> "STRINGS VERGLEICHEN MACHT KEINEN SPASS!")
>
> Anyway, in this case comparing in lowercase would cause no trouble at all 
> (comparing in uppercase however would, if you don't use the 
> not-really-existing-but-defined-by-unicode-Capital-ß).
>
> I don't know if there may be problems with special characters in other 
> languages, though.
>

One of the unicode documents mentions an example involving the three greek 
"sigma" letters, although I never quite understood how it demonstrated the 
inadequacy of using lower-case:

http://www.unicode.org/reports/tr21/tr21-5.html#Caseless_Matching
...Which references some information near the end of this sub-section:
http://www.unicode.org/reports/tr21/tr21-5.html#Introduction

Actually, what probably should be stored is a *normalized* folding-case 
version of the string, because then (if I understand correctly) memcmp could 
be used. I don't think memcpy technically works on non-ASCII (unless it's in 
normalized form).

In any case, Phobos doesn't currently handle any of that stuff at all, so my 
case-insensitive string type wouldn't be taking things backwards in that 
regard.




More information about the Digitalmars-d mailing list