Another Q about Unicode, Folding Greek edition!
Don
nospam at nospam.com
Wed Jun 9 01:30:37 PDT 2010
Nick Sabalausky wrote:
> Thanks all for the helpful responses. Since we seem to have some real
> Unicode-knowledge people here, I'd like to repost a question I had asked
> elsewhere awhile back, but didn't get an answer:
>
> --------------------------------------------------------------------------------
> Can someone explain how folding-case differs from lower-case and why it
> should be used for case-insensitive matching instead of lower-case?
>
> I was looking at this document, but still don't get it:
> http://www.unicode.org/reports/tr21/tr21-5.html
>
> The only part I see that directly addresses that is this:
>
> Case-folding is more than just conversion to lowercase.
> For example, it handles cases such as the Greek sigma,
> so that "?????" and "????S" will match correctly.
>
> Which references what it says earlier about sigma:
>
> Characters may also have different case mappings,
> depending on the context.
>
> For example, U+03A3 "S" capital sigma lowercases to
> U+03C3 "s" small sigma if it is followed by another
> letter, but lowercases to U+03C2 "?" small
> final sigma if it is not.
>
> But I still don't see how that demonstrates a need for anything other than
> toLower provided that the given toLower routine is already properly handling
> the "end of word"/"not end of word" difference.
> --------------------------------------------------------------------------------
>
> Unless, it's just extra speed due to not having to handle things like the
> "end of word"/"not end of word" difference?
If you want to case-insensitive find "as" in " basdaS " in English, you
can just convert both to lower case, and you'll find them both.
Now suppose you want to find "as" in the string " basdas ", where it's
all in Greek. It still occurs twice, but it you convert it to lower
case, each s has a different character. toLower() doesn't work.
More information about the Digitalmars-d
mailing list