VLERange: a range in between BidirectionalRange and RandomAccessRange

spir denis.spir at gmail.com
Mon Jan 17 07:59:53 PST 2011


On 01/15/2011 11:45 PM, Michel Fortin wrote:
> That said, I'm sure if someone could redesign Unicode by breaking
> backward-compatibility we'd have something simpler. You could probably
> get rid of pre-combined characters and reduce the number of
> normalization forms. But would you be able to get rid of normalization
> entirely? I don't think so. Reinventing Unicode is probably not worth it.

I think like you about pre-composed characters: they bring no real gain 
(even for easing passage from historic charsets, since texts must be 
decoded anyway, then mapping to single or multiple codes is nothing).
But they add complication to the design in proposing 2 // representation 
schemes (one character <--> one "code pile" versus one character <--> 
one precomposed code). And impose much weight on the back of software 
(and programmers) relative to correct indexing/ slicing and comparison, 
search, count, etc. Where normalisation forms enter the game.
My whoice would be:
* decomposed form only
* ordering imposed by the standard at text-composition time
==> no normalisation because everything is normalised from scratch.

Remains only what I call "piling". But we cannot easily get rid of it 
--without separators in standard UTF encodings.
I had the idea of UTF-33 ;-): a alternative freely agreed-upon encoding 
that just says (in addition to UTF-32) that the content is already 
normalised (NFD decomposed and ordered): either so produced intially or 
already processed. So that software can happily read texts in and only 
think at piling if needed. UTF-33+ would add "grapheme" separators (a 
costly solution in terms of space) to get rid of piling.
The aim indeed beeing to avoid stupidly doing the same job multiple 
times on the same text.

Denis
_________________
vita es estrany
spir.wikidot.com



More information about the Digitalmars-d mailing list