Fix Phobos dependencies on autodecoding

H. S. Teoh hsteoh at quickfur.ath.cx
Thu Aug 15 22:16:04 UTC 2019


On Thu, Aug 15, 2019 at 02:42:50PM -0700, Walter Bright via Digitalmars-d wrote:
> On 8/15/2019 2:26 PM, a11e99z wrote:
[...]
> > if it was not sarcasm:
> > different code points can ref to same glyphs not vice verse:
> > A(EN,\u0041), A(RU,\u0410), A(EL,\u0391)
> > else sorting for non English will not work.
> > 
> > even order(A<B) will be wrong for example such RU glyphs
> > ABCEHKMOPTXacepuxy
> > corresponds to next English letters by sound or meaning
> > AVSENKMORTHaserihu
> > as u can see even uppers and lowers don't exists as pairs and have
> > different meanings
> 
> Yes, I've heard this argument before.
> 
> The answer is that language should not be embedded in Unicode. It will
> lead to nothing but problems. The language is something externally
> assigned to a block of text, not the text itself, just like in printed
> text.
[...]

You cannot avoid conveying language in a string. Certain characters only
exist in certain languages, and the existence of the character itself
already encodes language. But that's a peripheral issue.

The more pertinent point is that *different* languages may reuse the
*same* glyphs for different (often completely unrelated) purposes. And
because of these different purposes, it changes the way the *same* glyph
is printed / laid out, and may affect other things in the surrounding
context as well.

Put it this way: you agree that the encoding of a character ought not to
change depending on font, right?

If so, consider your proposal to identify characters by glyph shape. A
letter with the shape 'u', by your argument, ought to be represented by
one, and only one, Unicode code point -- because, after all, it has the
same glyph shape.  Correct?

If so, now you have a problem: the shape 'u' in Cyrillic is the cursive
lowercase form of и.  So now you're essentially saying that all
occurrences of 'u' in Cyrillic text must be substituted with и when you
change the font from cursive to non-cursive.  Which is a contradiction
of the initial axiom that character encoding should not be
font-dependent.

Please explain how you solve this problem.


T

-- 
Real men don't take backups. They put their source on a public FTP-server and let the world mirror it. -- Linus Torvalds


More information about the Digitalmars-d mailing list