VLERange: a range in between BidirectionalRange and
Steven Schveighoffer
schveiguy at yahoo.com
Mon Jan 17 07:20:31 PST 2011
On Mon, 17 Jan 2011 10:14:19 -0500, spir <denis.spir at gmail.com> wrote:
> On 01/15/2011 08:51 PM, Steven Schveighoffer wrote:
>>> More over, Even if you ignore Hebrew as a tiny insignificant minority
>>> you cannot do the same for Arabic which has over one *billion* people
>>> that use that language.
>>
>> I hope that the medium type works 'good enough' for those languages,
>> with the high level type needed for advanced usages. At a minimum,
>> comparison and substring should work for all languages.
>
> Hello Steven,
>
> How does an application know that a given text, which supposedly is
> written in a given natural language (as for instance indicated by an
> html header) does not also hold terms from other languages? There are
> various occasions for this: quotations, use of foreign words, pointers...
>
> A side-issue is raised by precomposed codes for composite characters.
> For most languages of the world, I guess (but unsure), all "official"
> characters have single-code representations. Good, but unfortunately
> this is not enforced by the standard (instead, the decomposed form can
> sensibly be considered the base form, but this is another topic).
> So that even if ones knows for sure that all characters of all texts an
> app will ever deal with can be mapped to single codes, to be safe one
> would have to normalise to NFC anyway (Normalised Form Composed). Then,
> where is the actual gain? In fact, it is a loss because NFC is more
> costly than NFD (Decomposed) --actually, the standard NFC algo first
> decomposes to NFD to initially get an unique representation that can
> then be more easily (re)composed via simple mappings.
>
> For further information:
> Unicode's normalisation algos: http://unicode.org/reports/tr15/
> list of technical reports: http://unicode.org/reports/
> (Unicode's technical reports are far more readible than the standard
> itself, but unfortunately often refer to it.)
I'll reply to this to save you the trouble. I have reversed my position
since writing a lot of these posts.
In summary, I think strings should default to an element type of a
grapheme, which should be implemented via a slice of the original data.
Updated string type forthcoming.
-Steve
More information about the Digitalmars-d
mailing list