VLERange: a range in between BidirectionalRange and

spir denis.spir at gmail.com
Mon Jan 17 07:14:19 PST 2011


On 01/15/2011 08:51 PM, Steven Schveighoffer wrote:
>> More over, Even if you ignore Hebrew as a tiny insignificant minority
>> you cannot do the same for Arabic which has over one *billion* people
>> that use that language.
>
> I hope that the medium type works 'good enough' for those languages,
> with the high level type needed for advanced usages.  At a minimum,
> comparison and substring should work for all languages.

Hello Steven,

How does an application know that a given text, which supposedly is 
written in a given natural language (as for instance indicated by an 
html header) does not also hold terms from other languages? There are 
various occasions for this: quotations, use of foreign words, pointers...

A side-issue is raised by precomposed codes for composite characters. 
For most languages of the world, I guess (but unsure), all "official" 
characters have single-code representations. Good, but unfortunately 
this is not enforced by the standard (instead, the decomposed form can 
sensibly be considered the base form, but this is another topic).
So that even if ones knows for sure that all characters of all texts an 
app will ever deal with can be mapped to single codes, to be safe one 
would have to normalise to NFC anyway (Normalised Form Composed). Then, 
where is the actual gain? In fact, it is a loss because NFC is more 
costly than NFD (Decomposed) --actually, the standard NFC algo first 
decomposes to NFD to initially get an unique representation that can 
then be more easily (re)composed via simple mappings.

For further information:
Unicode's normalisation algos: http://unicode.org/reports/tr15/
list of technical reports: http://unicode.org/reports/
(Unicode's technical reports are far more readible than the standard 
itself, but unfortunately often refer to it.)

Denis
_________________
vita es estrany
spir.wikidot.com



More information about the Digitalmars-d mailing list