VLERange: a range in between BidirectionalRange and RandomAccessRange

spir denis.spir at gmail.com
Fri Jan 14 06:01:28 PST 2011


On 01/14/2011 07:26 AM, Nick Sabalausky wrote:
 > > "Andrei Alexandrescu"<SeeWebsiteForEmail at erdani.org>  wrote in message
 > > news:igoj6s$17r6$1 at digitalmars.com...
 >> >>
 >> >> I'm not so sure about that. What do you base this assessment on? 
Denis
 >> >> wrote a library that according to him does grapheme-related stuff 
nobody
 >> >> else does. So apparently graphemes is not what people care about 
(although
 >> >> it might be what they should care about).
 >> >>
 > >
 > > It's what they want, they just don't know it.
 > >
 > > Graphemes are what many people *think* code points are.
 > >
 >> >>
 >> >> This might be a good time to see whether we need to address graphemes
 >> >> systematically. Could you please post a few links that would 
educate me
 >> >> and others in the mysteries of combining characters?
 >> >>
 > >
 > > Maybe someone else has a link to an explanation (I don't), but it's
 > > basically just this:
If anyone finds a pointer to such an explanation, bravo, and than you.
(You will certainly not find it in Unicode literature, for instance.)
Nick's explanation below is good and concise. (Just 2 notes added.)

 > > Three levels of abstraction from lowest to highest:
 > > - Code Unit (ie, encoding)
 > > - Code Point (ie, what Unicode assigns distinct numbers to)
 > > - Grapheme (ie, what we think of as a "character")
 > >
 > > A code-point can be made up of one or more code-units. Likewise, a 
grapheme
 > > can be made up of one or more code-points.
 > >
 > > There are (at least) two types of code points:
 > >
 > > - Regular ones, such as letters, digits, and punctuation.
 > >
 > > - "Combining Characters", such as accent marks (or if you're 
familiar with
 > > Japanese, the little things in the upper-right corner that change 
an "s" to
 > > a "z" or an "h" to a "p". Or like German's umlaut - the two dots 
above a
 > > vowel). Ie, things that are not characters in their own right, but 
merely
 > > modify other characters. These can be often (always?) be thought of 
as being
 > > like overlays.
You can also say there are 2 kinds of characters: simple like "u" &
composite "ü" or "ṵ̈̈".

_________________
vita es estrany
spir.wikidot.com



More information about the Digitalmars-d mailing list