VLERange: a range in between BidirectionalRange and RandomAccessRange
spir
denis.spir at gmail.com
Fri Jan 14 06:01:28 PST 2011
On 01/14/2011 07:26 AM, Nick Sabalausky wrote:
> > "Andrei Alexandrescu"<SeeWebsiteForEmail at erdani.org> wrote in message
> > news:igoj6s$17r6$1 at digitalmars.com...
>> >>
>> >> I'm not so sure about that. What do you base this assessment on?
Denis
>> >> wrote a library that according to him does grapheme-related stuff
nobody
>> >> else does. So apparently graphemes is not what people care about
(although
>> >> it might be what they should care about).
>> >>
> >
> > It's what they want, they just don't know it.
> >
> > Graphemes are what many people *think* code points are.
> >
>> >>
>> >> This might be a good time to see whether we need to address graphemes
>> >> systematically. Could you please post a few links that would
educate me
>> >> and others in the mysteries of combining characters?
>> >>
> >
> > Maybe someone else has a link to an explanation (I don't), but it's
> > basically just this:
If anyone finds a pointer to such an explanation, bravo, and than you.
(You will certainly not find it in Unicode literature, for instance.)
Nick's explanation below is good and concise. (Just 2 notes added.)
> > Three levels of abstraction from lowest to highest:
> > - Code Unit (ie, encoding)
> > - Code Point (ie, what Unicode assigns distinct numbers to)
> > - Grapheme (ie, what we think of as a "character")
> >
> > A code-point can be made up of one or more code-units. Likewise, a
grapheme
> > can be made up of one or more code-points.
> >
> > There are (at least) two types of code points:
> >
> > - Regular ones, such as letters, digits, and punctuation.
> >
> > - "Combining Characters", such as accent marks (or if you're
familiar with
> > Japanese, the little things in the upper-right corner that change
an "s" to
> > a "z" or an "h" to a "p". Or like German's umlaut - the two dots
above a
> > vowel). Ie, things that are not characters in their own right, but
merely
> > modify other characters. These can be often (always?) be thought of
as being
> > like overlays.
You can also say there are 2 kinds of characters: simple like "u" &
composite "ü" or "ṵ̈̈".
_________________
vita es estrany
spir.wikidot.com
More information about the Digitalmars-d
mailing list