VLERange: a range in between BidirectionalRange and
spir
denis.spir at gmail.com
Mon Jan 17 07:14:19 PST 2011
On 01/15/2011 08:51 PM, Steven Schveighoffer wrote:
>> More over, Even if you ignore Hebrew as a tiny insignificant minority
>> you cannot do the same for Arabic which has over one *billion* people
>> that use that language.
>
> I hope that the medium type works 'good enough' for those languages,
> with the high level type needed for advanced usages. At a minimum,
> comparison and substring should work for all languages.
Hello Steven,
How does an application know that a given text, which supposedly is
written in a given natural language (as for instance indicated by an
html header) does not also hold terms from other languages? There are
various occasions for this: quotations, use of foreign words, pointers...
A side-issue is raised by precomposed codes for composite characters.
For most languages of the world, I guess (but unsure), all "official"
characters have single-code representations. Good, but unfortunately
this is not enforced by the standard (instead, the decomposed form can
sensibly be considered the base form, but this is another topic).
So that even if ones knows for sure that all characters of all texts an
app will ever deal with can be mapped to single codes, to be safe one
would have to normalise to NFC anyway (Normalised Form Composed). Then,
where is the actual gain? In fact, it is a loss because NFC is more
costly than NFD (Decomposed) --actually, the standard NFC algo first
decomposes to NFD to initially get an unique representation that can
then be more easily (re)composed via simple mappings.
For further information:
Unicode's normalisation algos: http://unicode.org/reports/tr15/
list of technical reports: http://unicode.org/reports/
(Unicode's technical reports are far more readible than the standard
itself, but unfortunately often refer to it.)
Denis
_________________
vita es estrany
spir.wikidot.com
More information about the Digitalmars-d
mailing list