Why UTF-8/16 character encodings?

anonymous anonymous at example.com
Fri May 24 10:54:46 PDT 2013


On Friday, 24 May 2013 at 17:05:57 UTC, Joakim wrote:
> On Friday, 24 May 2013 at 09:49:40 UTC, Jacob Carlborg wrote:
>> toUpper/lower cannot be made in place if it should handle all 
>> Unicode. Some characters will change their length when convert 
>> to/from uppercase. Examples of these are the German double S 
>> and some Turkish I.
>
> This triggered a long-standing bugbear of mine: why are we 
> using these variable-length encodings at all?  Does anybody 
> really care about UTF-8 being "self-synchronizing," ie does 
> anybody actually use that in this day and age?  Sure, it's 
> backwards-compatible with ASCII and the vast majority of usage 
> is probably just ASCII, but that means the other languages 
> don't matter anyway.  Not to mention taking the valuable 8-bit 
> real estate for English and dumping the longer encodings on 
> everyone else.

The German ß becomes SS when capitalised. It's no encoding issue.


More information about the Digitalmars-d mailing list