The Case Against Autodecode
Chris via Digitalmars-d
digitalmars-d at puremagic.com
Sun May 29 04:25:11 PDT 2016
On Saturday, 28 May 2016 at 22:29:12 UTC, Andrew Godfrey wrote:
[snip]
>
>
> From all the detail in this thread, I wonder now if "a
> grapheme" is even an unambiguous concept across different
> environments.
Unicode graphemes are not always the same as graphemes in natural
(written) languages. If <é> is composed in Unicode, it is still
one grapheme in a written language, not two distinct characters.
However, in natural languages two characters can be one grapheme,
as in English <sh>, it represents the sound in `shower, shop,
fish`. In German the same sound is represented by three
characters <sch> as in `Schaf` ("sheep"). A bit nit-picky but we
should make clear that we talk about "Unicode graphemes" that map
to single characters on the written page. But is that at all
possible across all languages?
To avoid confusion and misunderstandings we should agree on the
terminology first.
More information about the Digitalmars-d
mailing list