Precomposed Character & Grapheme on wikipedia

spir denis.spir at gmail.com
Tue Jan 25 08:50:30 PST 2011


Hello,

I stepped on wikipedia's article 
http://en.wikipedia.org/wiki/Precomposed_character which is, imo, excellent. 
(It does not (yet) cope with consequences in programming with Unicode that we 
debated on this list.)
A enigmatic point is "Precomposed characters are the legacy solution for 
representing many special letters in various character sets." I still fail to 
see how precomposed characters help in solving issues posed by texts encoded in 
legacy characters sets (since they need be decoded anyway). Explanation welcome.

This article brought me to http://en.wikipedia.org/wiki/Grapheme. Seems I was 
partially wrong in stating that using "grapheme" to denote what we commonly 
think as a character is an error. Possibly "grapheme" in english and "graphème" 
in french are not quite synonym. For instance, "ph" is commonly regarded as a 
single grapheme in french (<--> phoneme /f/ indeed), so that grapheme and 
chracter are not at all synonyms; while according to en-wikipedia's article it 
may be 2 in english. What do you think?
Still remains the point that the notion of grapheme only applies to elements of 
scripting systems (letters, syllables...), used to write 'words'. What we need 
is a term which, just like "character" in the context of computing, both for 
users and programmers, englobes thingies like tabulation or newline marks, 
copyright or paragraph signs, and much more... even the null character ;-).
"Grapheme" is usable provided it is clearly defined as meaning that, precisely, 
in the context of UCS/Unicode. What Unicode literature & and literature about 
Unicode do not do, AFAIK. Else, it is just adding confusion over confusion.

Denis
-- 
_________________
vita es estrany
spir.wikidot.com



More information about the Digitalmars-d mailing list