Compare string with German umlauts

Steven Schveighoffer schveiguy at gmail.com
Mon May 18 14:28:33 UTC 2020


On 5/18/20 9:44 AM, Martin Tschierschke wrote:
> Hi,
> I have to find a certain line in a file, with a text containing umlauts.
> 
> How do you do this?
> 
> The following was not working:
> 
> foreach(i,line; file){
>   if(line=="My text with ö oe, ä ae or ü"){
>     writeln("found it at line",i)
>   }
> }
> 
> I ended up using line.canFind("with part of the text without umlaut").
> 
> It solved the problem, but what is the right way to use umlauts (encode 
> them) inside the program?
> 

using == on strings is going to compare the exact bits for equality. In 
unicode, things can be encoded differently to make the same grapheme. 
For example, ö is a code unit that is the o with a diaeresis (U+00F6). 
But you could encode it with 2 code points -- a standard o, and then an 
diaeresis combining character (U+006F, U+0308)

What you need is to normalize the data for comparison: 
https://dlang.org/phobos/std_uni.html#normalize

For more reference: https://en.wikipedia.org/wiki/Combining_character

-Steve


More information about the Digitalmars-d-learn mailing list