The Case Against Autodecode

Chris via Digitalmars-d digitalmars-d at puremagic.com
Sun May 29 04:25:11 PDT 2016


On Saturday, 28 May 2016 at 22:29:12 UTC, Andrew Godfrey wrote:
[snip]
>
>
> From all the detail in this thread, I wonder now if "a 
> grapheme" is even an unambiguous concept across different 
> environments.

Unicode graphemes are not always the same as graphemes in natural 
(written) languages. If <é> is composed in Unicode, it is still 
one grapheme in a written language, not two distinct characters. 
However, in natural languages two characters can be one grapheme, 
as in English <sh>, it represents the sound in `shower, shop, 
fish`. In German the same sound is represented by three 
characters <sch> as in `Schaf` ("sheep"). A bit nit-picky but we 
should make clear that we talk about "Unicode graphemes" that map 
to single characters on the written page. But is that at all 
possible across all languages?

To avoid confusion and misunderstandings we should agree on the 
terminology first.


More information about the Digitalmars-d mailing list