VLERange: a range in between BidirectionalRange and RandomAccessRange
spir
denis.spir at gmail.com
Thu Jan 13 16:44:49 PST 2011
On 01/13/2011 11:00 PM, Nick Sabalausky wrote:
> "Andrei Alexandrescu"<SeeWebsiteForEmail at erdani.org> wrote in message
> news:ignon1$2p4k$1 at digitalmars.com...
>>
>> This may sometimes not be what the user expected; most of the time they'd
>> care about the code points.
>>
>
> I dunno, spir has succesfuly convinced me that most of the time it's
> graphemes the user cares about, not code points. Using code points is just
> as misleading as using UTF-16 code units.
You are right in that those 2 issues are really analog. In practice,
once universal text is truely and commonly used, I guess problems with
codes-do-not-represent-characters may become far more obvious; and also
far more serious because (logical) errors can easily pass by unseen.
[In fact, how can a programmer even know for instance that a search
routine missed its target or returned a false positive, when dealing
with characters from unknown languages? Indeed, there are test data
sets, but they are useless if the tools one uses just ignore the issues.]
The problem with using 16-bit representation and thus ignoring a fair
amount of codepoints is maybe less problematic because there are rather
few chances to randomly meet characters outside the BMP (Basic
Multiligual Plane, part of UCS which codepoints are < 0x10000).
Outside the BMP are scripting systems of less commonly studied
archeological languages, and various sets of images such as alchemical
symbols, playing cards or domino tiles. I doubt they'll ever be commonly
used, or else for specialised apps the programmer perfectly knows what
they deal with.
A list of UCS blocks with pointers to detailed content can be found here:
http://www.fileformat.info/info/unicode/block/index.htm
Blocks over the BMP start with the line:
Linear B Syllabary U+10000 U+1007F (88)
Denis
_________________
vita es estrany
spir.wikidot.com
More information about the Digitalmars-d
mailing list