Another Q about Unicode, Folding Greek edition!

Wed Jun 9 01:30:37 PDT 2010

Nick Sabalausky wrote:
> Thanks all for the helpful responses. Since we seem to have some real 
> Unicode-knowledge people here, I'd like to repost a question I had asked 
> elsewhere awhile back, but didn't get an answer:
> 
> --------------------------------------------------------------------------------
> Can someone explain how folding-case differs from lower-case and why it 
> should be used for case-insensitive matching instead of lower-case?
> 
> I was looking at this document, but still don't get it: 
> http://www.unicode.org/reports/tr21/tr21-5.html
> 
> The only part I see that directly addresses that is this:
> 
>       Case-folding is more than just conversion to lowercase.
>       For example, it handles cases such as the Greek sigma,
>       so that "?????" and "????S" will match correctly.
> 
> Which references what it says earlier about sigma:
> 
>       Characters may also have different case mappings,
>       depending on the context.
> 
>       For example, U+03A3 "S" capital sigma lowercases to
>       U+03C3 "s" small sigma if it is followed by another
>       letter, but lowercases to U+03C2 "?" small
>       final sigma if it is not.
> 
> But I still don't see how that demonstrates a need for anything other than 
> toLower provided that the given toLower routine is already properly handling 
> the "end of word"/"not end of word" difference.

> --------------------------------------------------------------------------------
> 
> Unless, it's just extra speed due to not having to handle things like the 
> "end of word"/"not end of word" difference?

If you want to case-insensitive find "as" in " basdaS " in English, you 
can just convert both to lower case, and you'll find them both.

Now suppose you want to find "as" in the string " basdas ", where it's 
all in Greek.  It still occurs twice, but it you convert it to lower 
case, each s has a different character. toLower() doesn't work.