VLERange: a range in between BidirectionalRange and

Steven Schveighoffer schveiguy at yahoo.com
Mon Jan 17 07:20:31 PST 2011


On Mon, 17 Jan 2011 10:14:19 -0500, spir <denis.spir at gmail.com> wrote:

> On 01/15/2011 08:51 PM, Steven Schveighoffer wrote:
>>> More over, Even if you ignore Hebrew as a tiny insignificant minority
>>> you cannot do the same for Arabic which has over one *billion* people
>>> that use that language.
>>
>> I hope that the medium type works 'good enough' for those languages,
>> with the high level type needed for advanced usages.  At a minimum,
>> comparison and substring should work for all languages.
>
> Hello Steven,
>
> How does an application know that a given text, which supposedly is  
> written in a given natural language (as for instance indicated by an  
> html header) does not also hold terms from other languages? There are  
> various occasions for this: quotations, use of foreign words, pointers...
>
> A side-issue is raised by precomposed codes for composite characters.  
> For most languages of the world, I guess (but unsure), all "official"  
> characters have single-code representations. Good, but unfortunately  
> this is not enforced by the standard (instead, the decomposed form can  
> sensibly be considered the base form, but this is another topic).
> So that even if ones knows for sure that all characters of all texts an  
> app will ever deal with can be mapped to single codes, to be safe one  
> would have to normalise to NFC anyway (Normalised Form Composed). Then,  
> where is the actual gain? In fact, it is a loss because NFC is more  
> costly than NFD (Decomposed) --actually, the standard NFC algo first  
> decomposes to NFD to initially get an unique representation that can  
> then be more easily (re)composed via simple mappings.
>
> For further information:
> Unicode's normalisation algos: http://unicode.org/reports/tr15/
> list of technical reports: http://unicode.org/reports/
> (Unicode's technical reports are far more readible than the standard  
> itself, but unfortunately often refer to it.)

I'll reply to this to save you the trouble.  I have reversed my position  
since writing a lot of these posts.

In summary, I think strings should default to an element type of a  
grapheme, which should be implemented via a slice of the original data.   
Updated string type forthcoming.

-Steve


More information about the Digitalmars-d mailing list