Why UTF-8/16 character encodings?
Juan Manuel Cabo
juanmanuel.cabo at gmail.com
Sat May 25 13:20:09 PDT 2013
On Saturday, 25 May 2013 at 19:51:43 UTC, Joakim wrote:
> On Saturday, 25 May 2013 at 19:03:53 UTC, Dmitry Olshansky
> wrote:
>> You can map a codepage to a subset of UCS :)
>> That's what they do internally anyway.
>> If I take you right you propose to define string as a header
>> that denotes a set of windows in code space? I still fail to
>> see how that would scale see below.
> Something like that. For a multi-language string encoding, the
> header would contain a single byte for every language used in
> the string, along with multiple index bytes to signify the
> start and finish of every run of single-language characters in
> the string. So, a list of languages and a list of pure
> single-language substrings. This is just off the top of my
> head, I'm not suggesting it is definitive.
>
You obviously are not thinking it through. Such encoding would
have a O(n^2) complexity for appending a character/symbol in a
different language to the string, since you would have to update
the beginning of the string, and move the contents forward to
make room. Not to mention that it wouldn't be backwards
compatible with ascii routines, and the complexity of such a
header would be have to be carried all the way to font rendering
routines in the OS.
Multiple languages/symbols in one string is a blessing of modern
humane computing. It is the norm more than the exception in most
of the world.
--jm
More information about the Digitalmars-d
mailing list