Why UTF-8/16 character encodings?

Joakim joakim at airpost.net
Sat May 25 13:03:58 PDT 2013


On Saturday, 25 May 2013 at 19:30:25 UTC, Walter Bright wrote:
> On the other hand, Joakim even admits his single byte encoding 
> is variable length, as otherwise he simply dismisses the rarely 
> used (!) Chinese, Japanese, and Korean languages, as well as 
> any text that contains words from more than one language.
I have noted from the beginning that these large alphabets have 
to be encoded to two bytes, so it is not a true constant-width 
encoding if you are mixing one of those languages into a 
single-byte encoded string.  But this "variable length" encoding 
is so much simpler than UTF-8, there's no comparison.

> I suspect he's trolling us, and quite successfully.
Ha, I wondered who would pull out this insult, quite surprised to 
see it's Walter.  It seems to be the trend on the internet to 
accuse anybody you disagree with of trolling, I am honestly 
surprised to see Walter stoop so low.  Considering I'm the only 
one making any cogent arguments here, perhaps I should wonder if 
you're all trolling me. ;)

On Saturday, 25 May 2013 at 19:35:42 UTC, Walter Bright wrote:
> I suspect the Chinese, Koreans, and Japanese would take 
> exception to being called irrelevant.
Irrelevant only because they are a small subset of the UCS.  I 
have noted that they would also be handled by a two-byte encoding.

> Good luck with your scheme that can't handle languages written 
> by billions of people!
So let's see: first you say that my scheme has to be variable 
length because I am using two bytes to handle these languages, 
then you claim I don't handle these languages.  This kind of 
blatant contradiction within two posts can only be called... 
trolling!


More information about the Digitalmars-d mailing list