VLERange: a range in between BidirectionalRange and RandomAccessRange

Thu Jan 13 16:44:49 PST 2011

On 01/13/2011 11:00 PM, Nick Sabalausky wrote:
> "Andrei Alexandrescu"<SeeWebsiteForEmail at erdani.org>  wrote in message
> news:ignon1$2p4k$1 at digitalmars.com...
>>
>> This may sometimes not be what the user expected; most of the time they'd
>> care about the code points.
>>
>
> I dunno, spir has succesfuly convinced me that most of the time it's
> graphemes the user cares about, not code points. Using code points is just
> as misleading as using UTF-16 code units.

You are right in that those 2 issues are really analog. In practice, 
once universal text is truely and commonly used, I guess problems with 
codes-do-not-represent-characters may become far more obvious; and also 
far more serious because (logical) errors can easily pass by unseen.
[In fact, how can a programmer even know for instance that a search 
routine missed its target or returned a false positive, when dealing 
with characters from unknown languages? Indeed, there are test data 
sets, but they are useless if the tools one uses just ignore the issues.]
The problem with using 16-bit representation and thus ignoring a fair 
amount of codepoints is maybe less problematic because there are rather 
few chances to randomly meet characters outside the BMP (Basic 
Multiligual Plane, part of UCS which codepoints are < 0x10000).
Outside the BMP are scripting systems of less commonly studied 
archeological languages, and various sets of images such as alchemical 
symbols, playing cards or domino tiles. I doubt they'll ever be commonly 
used, or else for specialised apps the programmer perfectly knows what 
they deal with.

A list of UCS blocks with pointers to detailed content can be found here:
http://www.fileformat.info/info/unicode/block/index.htm
Blocks over the BMP start with the line:
Linear B Syllabary 	U+10000 	U+1007F 	(88)

Denis
_________________
vita es estrany
spir.wikidot.com