Replacing tango.text.Ascii.isearch

bauss jacobbauss at gmail.com
Thu Oct 13 08:42:14 UTC 2022


On Thursday, 13 October 2022 at 08:35:50 UTC, bauss wrote:
> On Thursday, 13 October 2022 at 08:30:04 UTC, rikki cattermole 
> wrote:
>> On 13/10/2022 9:27 PM, bauss wrote:
>>> This doesn't actually work properly in all languages. It will 
>>> probably work in most, but it's not entirely correct.
>>> 
>>> Ex. Turkish will not work with it properly.
>>> 
>>> Very interesting article: 
>>> http://www.moserware.com/2008/02/does-your-code-pass-turkey-test.html
>>
>> Yes turkic languages, they require a state machine and quite a 
>> bit of LUTs to work correctly.
>>
>> You also need to provide a language and it has to operate on 
>> the whole string, not individual characters.
>>
>> I didn't think it was relevant since Ascii was in the original 
>> post ;)
>
> I think it's relevant when it comes to D since D is arguably a 
> unicode language, not ascii.
>
> D should strive to be correct, rather than fast.

Oh and to add onto this, IFF you have to do it the hacky way, 
then converting to uppercase instead of lowercase should be 
preferred, because not all lowercase characters can perform round 
trip, although a small group of characters, then using uppercase 
fixes it, so that's a relatively easy fix. A round trip is 
basically converting characters from one culture to another and 
then back. It's impossible with some characters when converting 
to lowercase, but should always be possible when converting to 
uppercase.


More information about the Digitalmars-d-learn mailing list