std.algorithm.remove and principle of least astonishment
spir
denis.spir at gmail.com
Mon Nov 22 05:57:39 PST 2010
On Mon, 22 Nov 2010 07:34:15 -0500
Michel Fortin <michel.fortin at michelf.com> wrote:
> Just to add to the compexity: graphemes aren't always equivalent to
> user-perceived characters either. Ligatures can contain more than one
> user-perceived characters. If you're looking for the substring
> "flourish" in a string, should it fail to match when it encounters
> "flourish" just because of the "fl" (fl) ligature? On most Mac
> applications it matches both thanks to sensible defaults in NSString's
> search and comparison algorithms.
That's true. I guess you're thinking at the distinction between NFD/NFC "canonical forms" and NFKD/NFKC ones (so-called "compatibility").
> So perhaps we need yet another layer over graphemes to represent
> user-perceived characters.
In my view, this is not the responsability of a general-purpose tool. I guess, but may be wrong, we are clearly entering the field of app logics and semantics. These are for me _not_ general-purpose points (but builtin types & libraries often offer clearly non-general routines like one dealing with casing, or even less general: the set of ASCII letters). These issues would have to be dealt with either by apps or by domain-specific libraries.
I find it wrong that Unicode even simply provides standard canonical forms for them (but fortunately common libs do not implement them AFAIK)
denis
-- -- -- -- -- -- --
vit esse estrany ☣
spir.wikidot.com
More information about the Digitalmars-d
mailing list