RFC: Case-Insensitive Strings (And usually they really do *have*case)

Nick Sabalausky a at a.a
Mon Jan 10 13:24:50 PST 2011


"Michel Fortin" <michel.fortin at michelf.com> wrote in message 
news:igft2o$291g$1 at digitalmars.com...
> On 2011-01-10 13:46:55 -0500, "Nick Sabalausky" <a at a.a> said:
>
>> Not carrying any other data means not caching the lowercase version, 
>> which
>> means recreating the lowercase version more than necessary. So it's the
>> classic speed vs. space tradeoff. I would think there would be cases 
>> where
>> they get compared enough for that to make a difference, although I 
>> suppose
>> we'd really need benchmarks to see. OTOH, there are certainly cases (such 
>> as
>> my original motivating case) where the extra space is not an issue at 
>> all.
>
> Comparing the lowercase version of two strings works well for ASCII, but I 
> doubt it works very well for Unicode. Case conversion is not bidirectional 
> (for instance both 'SS' and 'ß' become 'ss' in lowercase in German), and 
> what's equal and what is not sometime depends on the language.
>
> Checking for string equality is a special case of the Unicode collation 
> algorithm. I'm not sure if implementing this part of Unicode is in the 
> scope of Phobos (probably not), but short of having Unicode support it 
> seems the utility of having a special string type dedicated to ASCII 
> case-insensitive strings is quite limited.
>

Yea, Phobos doesn't even have folding-case functions yet (which is why I 
keep saying "lowercase"). (This is actually one place where Phobos is still 
behind Tango.)

However, I really think that's orthogonal to this since std.string.icmp 
doesn't handle such non-english issues either (just the english a-z, A-Z, 
and that's it). When Phobos does become multilingual, then this can be 
updated to follow suit.

One question though: Aren't 'SS' and 'ß' considered the same in german 
anyway? If so, how does using lowercase instead of folding case cause a 
problem?





More information about the Digitalmars-d mailing list