VLERange: a range in between BidirectionalRange and RandomAccessRange

Fri Jan 14 04:47:17 PST 2011

Am 14.01.2011 08:00, schrieb Nick Sabalausky:
> "Nick Sabalausky"<a at a.a>  wrote in message
> news:igori7$1ovh$1 at digitalmars.com...
>> "Andrei Alexandrescu"<SeeWebsiteForEmail at erdani.org>  wrote in message
>> news:igoqrm$1n5r$1 at digitalmars.com...
>>> On 1/13/11 10:26 PM, Nick Sabalausky wrote:
>>> [snip]
>>>> [ 'f', {u with the umlaut}, 'n', 'f' ]
>>>>
>>>> Or:
>>>>
>>>> [ 'f', 'u', {umlaut combining character}, 'n', 'f' ]
>>>>
>>>> Those *both* get rendered exactly the same, and both represent the same
>>>> four-letter sequence. In the second example, the 'u' and the {umlaut
>>>> combining character} combine to form one grapheme. The f's and n's just
>>>> happen to be single-code-point graphemes.
>>>>
>>>> Note that while some characters exist in pre-combined form (such as the
>>>> {u
>>>> with the umlaut} above), legend has it there are others than can only be
>>>> represented using a combining character.
>>>>
>>>> It's also my understanding, though I'm not certain, that sometimes
>>>> multiple
>>>> combining characters can be used together on the same "root" character.
>>>
>>> Thanks. One further question is: in the above example with u-with-umlaut,
>>> there is one code point that corresponds to the entire combination. Are
>>> there combinations that do not have a unique code point?
>>>
>>
>> My understanding is "yes". At least that's what I've heard, and I've never
>> heard any claims of "no". I don't know of any specific ones offhand,
>> though. Actually, it might be possible to use any combining character with
>> any old letter or number (like maybe a 7 with an umlaut), though I'm not
>> certain.
>>
>> FWIW, the Wikipedia article might help, or at least link to other things
>> that might help: http://en.wikipedia.org/wiki/Combining_character
>>
>> Michel or spir might have better links though.
>>
>
> Heh, as if that wasn't bad enough, there's also digraphs which, from what I
> can tell, seem to be single code-points that represent more than one
> glyph/character/grapheme:
>
> http://en.wikipedia.org/wiki/Digraph_(orthography)#Digraphs_in_Unicode
>
> This page may be helpful too:
> http://en.wikipedia.org/wiki/Precomposed_character
>

OMG, this is really fucked up.
Can't we just go back to 8bit charsets like ISO 8859-* etc? :/