The Case Against Autodecode
Marco Leise via Digitalmars-d
digitalmars-d at puremagic.com
Tue May 31 14:11:42 PDT 2016
Am Tue, 31 May 2016 13:06:16 -0400
schrieb Andrei Alexandrescu <SeeWebsiteForEmail at erdani.org>:
> On 05/31/2016 12:54 PM, Jonathan M Davis via Digitalmars-d wrote:
> > Equality does not require decoding. Similarly, functions like find don't
> > either. Something like filter generally would, but it's also not
> > particularly normal to filter a string on a by-character basis. You'd
> > probably want to get to at least the word level in that case.
>
> It's nice that the stdlib takes care of that.
Both "equality" and "find" require byGrapheme.
⇰ The equivalence algorithm first brings both strings to a
common normalization form (NFD or NFC), which works on one
grapheme cluster at a time and afterwards does the binary
comparison.
http://www.unicode.org/reports/tr15/#Canon_Compat_Equivalence
⇰ Find would yield false positives for the start of grapheme clusters.
I.e. will match 'o' in an NFD "ö" (simplified example).
http://www.unicode.org/reports/tr10/#Searching
--
Marco
More information about the Digitalmars-d
mailing list