VLERange: a range in between BidirectionalRange andRandomAccessRange

Nick Sabalausky a at a.a
Fri Jan 14 11:20:40 PST 2011


"spir" <denis.spir at gmail.com> wrote in message 
news:mailman.619.1295012086.4748.digitalmars-d at puremagic.com...
>
> If anyone finds a pointer to such an explanation, bravo, and than you. 
> (You will certainly not find it in Unicode literature, for instance.)
> Nick's explanation below is good and concise. (Just 2 notes added.)

Yea, most Unicode explanations seem to talk all about "code-units vs 
code-points" and then they'll just have a brief note like "There's also 
other things like digraphs and combining codes." And that'll be all they 
mention.

You're right about the Unicode literature. It's the usual standards-body 
documentation, same as W3C: "Instead of only some people understanding how 
this works, lets encode the documentation in legalese (and have twenty 
only-slightly-different versions) to make sure that nobody understands how 
it works."

> You can also say there are 2 kinds of characters: simple like "u" & 
> composite "ü" or "ü??". The former are coded with a single (base) code, 
> the latter with one (rarely more) base codes and an arbitrary number of 
> combining codes.

Couple questions about the "more than one base codes":

- Do you know an example offhand?

- Does that mean like a ligature where the base codes form a single glyph, 
or does it mean that the combining code either spans or operates over 
multiple glyphs? Or can it go either way?

> For a majority of _common_ characters made of 2 or 3 codes (western 
> language letters, korean Hangul syllables,...), precombined codes have 
> been added to the set. Thus, they can be coded with a single code like 
> simple characters.
>

Out of curiosity, how do decomposed Hangul characters work? (Or do you 
know?) Not actually knowing any Korean, my understanding is that they're a 
set of 1 to 4 phoenetic glyphs that are then combined into one glyph. So, it 
is like a series of base codes that automatically combine, or are there 
combining characters involved?

> [Also note, to avoid things be too simple ;-), some (few) combining codes 
> called "prepend" come _before_ the base in raw code sequence...]
>

Fun!





More information about the Digitalmars-d mailing list