The Case Against Autodecode

Chris via Digitalmars-d digitalmars-d at puremagic.com
Sun May 29 10:06:28 PDT 2016


On Sunday, 29 May 2016 at 13:04:18 UTC, Tobias M wrote:
> On Sunday, 29 May 2016 at 12:08:52 UTC, default0 wrote:
>> I am pretty sure that a single grapheme in unicode does not 
>> correspond to your notion of "character". I am pretty sure 
>> that what you think of as a "character" is officially called 
>> "Grapheme Cluster" not "Grapheme".
>
> Grapheme is a linguistic term. AFAIUI, a grapheme cluster is a 
> cluster of codepoints representing a grapheme. It's called 
> "cluster" in the unicode spec, because there there is no 
> dedicated grapheme unit.

> I put "character" into quotes, because the term is not really 
> well defined. I just used it for a short and pregnant answer. 
> I'm sure there's a better/more correct definition of 
> graphem/phoneme, but it's probably also much longer and 
> complicated.

Which is why we need to agree on a terminology, i.e. be clear 
when we use linguistic terms and when we use Unicode specific 
terminology.


More information about the Digitalmars-d mailing list