The Case Against Autodecode

Marco Leise via Digitalmars-d digitalmars-d at puremagic.com
Tue May 31 14:11:42 PDT 2016


Am Tue, 31 May 2016 13:06:16 -0400
schrieb Andrei Alexandrescu <SeeWebsiteForEmail at erdani.org>:

> On 05/31/2016 12:54 PM, Jonathan M Davis via Digitalmars-d wrote:
> > Equality does not require decoding. Similarly, functions like find don't
> > either. Something like filter generally would, but it's also not
> > particularly normal to filter a string on a by-character basis. You'd
> > probably want to get to at least the word level in that case.  
> 
> It's nice that the stdlib takes care of that.

Both "equality" and "find" require byGrapheme.

 ⇰ The equivalence algorithm first brings both strings to a
   common normalization form (NFD or NFC), which works on one
   grapheme cluster at a time and afterwards does the binary
   comparison.
   http://www.unicode.org/reports/tr15/#Canon_Compat_Equivalence

 ⇰ Find would yield false positives for the start of grapheme clusters.
   I.e. will match 'o' in an NFD "ö" (simplified example).
   http://www.unicode.org/reports/tr10/#Searching

-- 
Marco



More information about the Digitalmars-d mailing list