The Case Against Autodecode

Fri Jun 3 03:14:15 PDT 2016

On Friday, 3 June 2016 at 10:08:43 UTC, Walter Bright wrote:
> On 6/3/2016 1:05 AM, H. S. Teoh via Digitalmars-d wrote:
>> At the time
>> Unicode also had to grapple with tricky issues like what to do 
>> with
>> lookalike characters that served different purposes or had 
>> different
>> meanings, e.g., the mu sign in the math block vs. the real 
>> letter mu in
>> the Greek block, or the Cyrillic A which looks and behaves 
>> exactly like
>> the Latin A, yet the Cyrillic Р, which looks like the Latin P, 
>> does
>> *not* mean the same thing (it's the equivalent of R), or the 
>> Cyrillic В
>> whose lowercase is в not b, and also had a different sound, but
>> lowercase Latin b looks very similar to Cyrillic ь, which 
>> serves a
>> completely different purpose (the uppercase is Ь, not B, you 
>> see).
>
> I don't see that this is tricky at all. Adding additional 
> semantic meaning that does not exist in printed form was 
> outside of the charter of Unicode. Hence there is no 
> justification for having two distinct characters with identical 
> glyphs.

That's not right either. Cyrillic letters can look slightly 
different from their latin lookalikes in some circumstances.

I'm sure there are extremely good reasons for not using the latin 
lookalikes in the Cyrillic alphabets, because most (all?) 8-bit 
Cyrillic encodings use separate codes for the lookalikes. It's 
not restricted to Unicode.