GREETINGS FROM iSTANBUL
Paul Backus
snarwin at gmail.com
Sun Aug 1 18:22:05 UTC 2021
On Sunday, 1 August 2021 at 17:56:00 UTC, rikki cattermole wrote:
> It appears you are using the wrong lowercase character.
>
> https://en.wikipedia.org/wiki/Dotted_and_dotless_I
>
> From a quick experiment, it appears std.uni is treating the
> upper case dotted I's lower case as a grapheme. Which it
> probably shouldn't be as there is an actual character for that.
>
> We might need to update our unicode database... or something.
It's not the wrong lower-case character. Turkish uses U+0069
(a.k.a. ASCII 'i') for lower-case dotted I, but has a non-default
case mapping that pairs U+0069 with U+0130 ('İ') rather than
U+0049 (ASCII 'I'). Phobos' std.uni uses the default case mapping
for its toUpper function, so it does not produce the correct
result for Turkish text.
Source: https://www.unicode.org/faq/casemap_charprop.html#1
A common solution to this in other languages is to have a version
of toUpper that takes a locale as an argument. Some examples:
- Javascript:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/toLocaleUpperCase
- Go: https://pkg.go.dev/strings#ToUpperSpecial
- Java:
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/String.html#toUpperCase(java.util.Locale)
- C#:
https://docs.microsoft.com/en-US/dotnet/api/system.string.toupper?view=net-5.0
More information about the Digitalmars-d-learn
mailing list