VLERange: a range in between BidirectionalRange and RandomAccessRange
spir
denis.spir at gmail.com
Wed Jan 19 01:15:50 PST 2011
On 01/18/2011 06:11 AM, Ali Çehreli wrote:
> Thanks to all that has contributed, I am also following this thread with
> great interest. :)
>
> Michel Fortin wrote:
> > I mean, a grapheme is a slice of a string, can have multiple code points
> > (like a string), can be appended the same way as a string, can be
> > composed or decomposed using canonical normalization or compatibility
> > normalization (like a string), and should be sorted, uppercased, and
> > lowercased according to Unicode rules (like a string). Basically, a
> > grapheme is just a string that happens to contain only one grapheme.
>
> I would like to stress the fact that Unicode knows nothing about
> sorting, uppercasing, or lowercasing.
>
> Those operations are tied to the alphabet (or writing system) that a
> certain grapheme happens to belong to at a given time. For example, we
> cannot uppercase the letter i without knowing what alphabet we are
> dealing with. Two possibilities: I and İ (I dot above).
>
> It is the same issue with sorting.
This is true and false ;-)
You are right, indeed, on the fact that issues like sorting one are
language-specific, and more, use-case-specific. The case of the turkish
beeing a good example. For another one, in french I do not even know
whether there is an official rule! Anyway, whatever the answer, even eg
famous newpapers, and official documents, used different rules. Most of
them let down accents on uppercase, possibly because of computer
limitation; there is a recent move (back) toward accented uppercase.
This is very annoying: "Hélène" has 2 consistent and used uppercase
versions. Conversely, how is software supposed to guess the lowercase
version of "HELENE"?
Upon Unicode, it still defines norms for casing and so-called collation
(compare, for sorting) algorithms. Dunno much more, i have never applied
them, personly, for reasons like the ones above. The full list of it's
technical docs can be found at http://unicode.org/reports/. See in
particular http://unicode.org/reports/tr10/ for collation.
(Unfortnately, case mapping is know part of the core standard doc, so
that it's hard to get it.)
Denis
_________________
vita es estrany
spir.wikidot.com
More information about the Digitalmars-d
mailing list