RFC: Case-Insensitive Strings (And usually they really do *have*case)

Daniel Gibson metalcaedes at gmail.com
Mon Jan 10 13:30:50 PST 2011


Am 10.01.2011 22:16, schrieb Michel Fortin:
> On 2011-01-10 13:46:55 -0500, "Nick Sabalausky" <a at a.a> said:
>
>> Not carrying any other data means not caching the lowercase version, which
>> means recreating the lowercase version more than necessary. So it's the
>> classic speed vs. space tradeoff. I would think there would be cases where
>> they get compared enough for that to make a difference, although I suppose
>> we'd really need benchmarks to see. OTOH, there are certainly cases (such as
>> my original motivating case) where the extra space is not an issue at all.
>
> Comparing the lowercase version of two strings works well for ASCII, but I doubt
> it works very well for Unicode. Case conversion is not bidirectional (for
> instance both 'SS' and 'ß' become 'ss' in lowercase in German),

That's wrong, 'ß' is lowercase and no upper-case version is used really, though 
one exists in Unicode (see: http://en.wikipedia.org/wiki/Capital_%C3%9F ).
Sometimes, when stuff is written in fullcaps, 'ß' (which never is the first 
character of a word) is replaced by "SS", but I wouldn't expect that to be equal 
on icmp(). (e.g. "Strings vergleichen macht keinen Spaß!" vs "STRINGS 
VERGLEICHEN MACHT KEINEN SPASS!")

Anyway, in this case comparing in lowercase would cause no trouble at all 
(comparing in uppercase however would, if you don't use the 
not-really-existing-but-defined-by-unicode-Capital-ß).

I don't know if there may be problems with special characters in other 
languages, though.

> and what's equal
> and what is not sometime depends on the language.
>
> Checking for string equality is a special case of the Unicode collation
> algorithm. I'm not sure if implementing this part of Unicode is in the scope of
> Phobos (probably not), but short of having Unicode support it seems the utility
> of having a special string type dedicated to ASCII case-insensitive strings is
> quite limited.
>



More information about the Digitalmars-d mailing list