Why UTF-8/16 character encodings?
Joakim
joakim at airpost.net
Fri May 24 13:45:35 PDT 2013
On Friday, 24 May 2013 at 20:37:58 UTC, Joakim wrote:
>> 3. Even if I have a string that is 99% ASCII then I have to
>> pay extra bytes for every character just because 1% wasn't
>> ASCII. With UTF-8, I only pay the extra bytes when needed.
> I don't understand what you mean here. If your string has a
> thousand non-ASCII characters, the UTF-8 version will have one
> or two thousand more characters, ie 1 or 2 KB more. My format
> would add a couple bytes in the header for each non-ASCII
> language character used, that's it. It's a clear win for my
> format.
Sorry, I was a bit imprecise. Here's what I meant to write:
I don't understand what you mean here. If your string has a
thousand non-ASCII characters, the UTF-8 version will have one
or two thousand more bytes, ie 1 or 2 KB more. My format
would add a couple bytes in the header for each non-ASCII
language used, that's it. It's a clear win for my format.
More information about the Digitalmars-d
mailing list