Precomposed Character & Grapheme on wikipedia

Nick Sabalausky a at a.a
Tue Jan 25 12:05:37 PST 2011


"spir" <denis.spir at gmail.com> wrote in message 
news:mailman.940.1295974243.4748.digitalmars-d at puremagic.com...
> Hello,
>
> I stepped on wikipedia's article 
> http://en.wikipedia.org/wiki/Precomposed_character which is, imo, 
> excellent. (It does not (yet) cope with consequences in programming with 
> Unicode that we debated on this list.)
> A enigmatic point is "Precomposed characters are the legacy solution for 
> representing many special letters in various character sets." I still fail 
> to see how precomposed characters help in solving issues posed by texts 
> encoded in legacy characters sets (since they need be decoded anyway). 
> Explanation welcome.
>

My guess, and this is only a guess, would be that they felt it would make 
rendering easier since 1. Many fonts already had precomposed characters, but 
may not have had any of the "modifier" markings by themselves, and 2. Font 
rendering libraries probably didn't support characters with "overlays".


> This article brought me to http://en.wikipedia.org/wiki/Grapheme. Seems I 
> was partially wrong in stating that using "grapheme" to denote what we 
> commonly think as a character is an error. Possibly "grapheme" in english 
> and "graphème" in french are not quite synonym. For instance, "ph" is 
> commonly regarded as a single grapheme in french (<--> phoneme /f/ 
> indeed), so that grapheme and chracter are not at all synonyms; while 
> according to en-wikipedia's article it may be 2 in english. What do you 
> think?

No, a grapheme is the common notion of character:

A phoneme is an atomic unit of vocal *sound*. So all that article is saying 
is that a grapheme (single written unit) can represent either:

- One specific sound
- No particular sound (like '&' or the chinese characters)
- Different sounds depending on context (like the english 'c')
- Or, as with the french 'ph', the japanese 'kyou', or the german 'sch', 
multiple graphemes can form one sound. These are known as digraphs and 
trigraphs.





More information about the Digitalmars-d mailing list