RFC: Case-Insensitive Strings (And usually they really do *have*case)
Nick Sabalausky
a at a.a
Mon Jan 10 13:24:50 PST 2011
"Michel Fortin" <michel.fortin at michelf.com> wrote in message
news:igft2o$291g$1 at digitalmars.com...
> On 2011-01-10 13:46:55 -0500, "Nick Sabalausky" <a at a.a> said:
>
>> Not carrying any other data means not caching the lowercase version,
>> which
>> means recreating the lowercase version more than necessary. So it's the
>> classic speed vs. space tradeoff. I would think there would be cases
>> where
>> they get compared enough for that to make a difference, although I
>> suppose
>> we'd really need benchmarks to see. OTOH, there are certainly cases (such
>> as
>> my original motivating case) where the extra space is not an issue at
>> all.
>
> Comparing the lowercase version of two strings works well for ASCII, but I
> doubt it works very well for Unicode. Case conversion is not bidirectional
> (for instance both 'SS' and 'ß' become 'ss' in lowercase in German), and
> what's equal and what is not sometime depends on the language.
>
> Checking for string equality is a special case of the Unicode collation
> algorithm. I'm not sure if implementing this part of Unicode is in the
> scope of Phobos (probably not), but short of having Unicode support it
> seems the utility of having a special string type dedicated to ASCII
> case-insensitive strings is quite limited.
>
Yea, Phobos doesn't even have folding-case functions yet (which is why I
keep saying "lowercase"). (This is actually one place where Phobos is still
behind Tango.)
However, I really think that's orthogonal to this since std.string.icmp
doesn't handle such non-english issues either (just the english a-z, A-Z,
and that's it). When Phobos does become multilingual, then this can be
updated to follow suit.
One question though: Aren't 'SS' and 'ß' considered the same in german
anyway? If so, how does using lowercase instead of folding case cause a
problem?
More information about the Digitalmars-d
mailing list