Why UTF-8/16 character encodings?
Joakim
joakim at airpost.net
Sat May 25 13:03:58 PDT 2013
On Saturday, 25 May 2013 at 19:30:25 UTC, Walter Bright wrote:
> On the other hand, Joakim even admits his single byte encoding
> is variable length, as otherwise he simply dismisses the rarely
> used (!) Chinese, Japanese, and Korean languages, as well as
> any text that contains words from more than one language.
I have noted from the beginning that these large alphabets have
to be encoded to two bytes, so it is not a true constant-width
encoding if you are mixing one of those languages into a
single-byte encoded string. But this "variable length" encoding
is so much simpler than UTF-8, there's no comparison.
> I suspect he's trolling us, and quite successfully.
Ha, I wondered who would pull out this insult, quite surprised to
see it's Walter. It seems to be the trend on the internet to
accuse anybody you disagree with of trolling, I am honestly
surprised to see Walter stoop so low. Considering I'm the only
one making any cogent arguments here, perhaps I should wonder if
you're all trolling me. ;)
On Saturday, 25 May 2013 at 19:35:42 UTC, Walter Bright wrote:
> I suspect the Chinese, Koreans, and Japanese would take
> exception to being called irrelevant.
Irrelevant only because they are a small subset of the UCS. I
have noted that they would also be handled by a two-byte encoding.
> Good luck with your scheme that can't handle languages written
> by billions of people!
So let's see: first you say that my scheme has to be variable
length because I am using two bytes to handle these languages,
then you claim I don't handle these languages. This kind of
blatant contradiction within two posts can only be called...
trolling!
More information about the Digitalmars-d
mailing list