VLERange: a range in between BidirectionalRange andRandomAccessRange
Nick Sabalausky
a at a.a
Fri Jan 14 11:20:40 PST 2011
"spir" <denis.spir at gmail.com> wrote in message
news:mailman.619.1295012086.4748.digitalmars-d at puremagic.com...
>
> If anyone finds a pointer to such an explanation, bravo, and than you.
> (You will certainly not find it in Unicode literature, for instance.)
> Nick's explanation below is good and concise. (Just 2 notes added.)
Yea, most Unicode explanations seem to talk all about "code-units vs
code-points" and then they'll just have a brief note like "There's also
other things like digraphs and combining codes." And that'll be all they
mention.
You're right about the Unicode literature. It's the usual standards-body
documentation, same as W3C: "Instead of only some people understanding how
this works, lets encode the documentation in legalese (and have twenty
only-slightly-different versions) to make sure that nobody understands how
it works."
> You can also say there are 2 kinds of characters: simple like "u" &
> composite "ü" or "ü??". The former are coded with a single (base) code,
> the latter with one (rarely more) base codes and an arbitrary number of
> combining codes.
Couple questions about the "more than one base codes":
- Do you know an example offhand?
- Does that mean like a ligature where the base codes form a single glyph,
or does it mean that the combining code either spans or operates over
multiple glyphs? Or can it go either way?
> For a majority of _common_ characters made of 2 or 3 codes (western
> language letters, korean Hangul syllables,...), precombined codes have
> been added to the set. Thus, they can be coded with a single code like
> simple characters.
>
Out of curiosity, how do decomposed Hangul characters work? (Or do you
know?) Not actually knowing any Korean, my understanding is that they're a
set of 1 to 4 phoenetic glyphs that are then combined into one glyph. So, it
is like a series of base codes that automatically combine, or are there
combining characters involved?
> [Also note, to avoid things be too simple ;-), some (few) combining codes
> called "prepend" come _before_ the base in raw code sequence...]
>
Fun!
More information about the Digitalmars-d
mailing list