RFC: Case-Insensitive Strings (And usually they really do *have*case)

Michel Fortin michel.fortin at michelf.com
Mon Jan 10 13:16:31 PST 2011


On 2011-01-10 13:46:55 -0500, "Nick Sabalausky" <a at a.a> said:

> Not carrying any other data means not caching the lowercase version, which
> means recreating the lowercase version more than necessary. So it's the
> classic speed vs. space tradeoff. I would think there would be cases where
> they get compared enough for that to make a difference, although I suppose
> we'd really need benchmarks to see. OTOH, there are certainly cases (such as
> my original motivating case) where the extra space is not an issue at all.

Comparing the lowercase version of two strings works well for ASCII, 
but I doubt it works very well for Unicode. Case conversion is not 
bidirectional (for instance both 'SS' and 'ß' become 'ss' in lowercase 
in German), and what's equal and what is not sometime depends on the 
language.

Checking for string equality is a special case of the Unicode collation 
algorithm. I'm not sure if implementing this part of Unicode is in the 
scope of Phobos (probably not), but short of having Unicode support it 
seems the utility of having a special string type dedicated to ASCII 
case-insensitive strings is quite limited.

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/



More information about the Digitalmars-d mailing list