Why UTF-8/16 character encodings?
Peter Alexander
peter.alexander.au at gmail.com
Fri May 24 10:43:00 PDT 2013
On Friday, 24 May 2013 at 17:05:57 UTC, Joakim wrote:
> This triggered a long-standing bugbear of mine: why are we
> using these variable-length encodings at all?
Simple: backwards compatibility with all ASCII APIs (e.g. most C
libraries), and because I don't want my strings to consume
multiple bytes per character when I don't need it.
Your language header idea is no good for at least three reasons:
1. What happens if I want to take a substring slice of your
string? I'll need to allocate a new string to add the header in.
2. What if I have a long string with the ASCII header and want to
append a non-ASCII character on the end? I'll need to reallocate
the whole string and widen it with the new header.
3. Even if I have a string that is 99% ASCII then I have to pay
extra bytes for every character just because 1% wasn't ASCII.
With UTF-8, I only pay the extra bytes when needed.
More information about the Digitalmars-d
mailing list