Why I chose D over Ada and Eiffel
monarch_dodra
monarchdodra at gmail.com
Tue Aug 20 06:56:05 PDT 2013
On Tuesday, 20 August 2013 at 12:59:13 UTC, Andrej Mitrovic wrote:
> On 8/19/13, Ramon <spam at thanks.no> wrote:
>> Plus UTF, too. Even UTF-8, 16 (a very practical compromise
>> in
>> my minds eye because with 16 bits one can deal with *every*
>> language while still not wasting memory).
>
> UTF-8 can deal with every language as well. But perhaps you
> meant
> something else here.
>
> Anyway welcome aboard!
I think he meant that every "modern spoken/written" language fits
in the "Basic Multilingual Plane", for which each codepoint fits
in a single UTF16 code unit (2 bytes). Multiple codeunit
uncodings in UTF-16 are *very* rare.
On the other hand, if you encode japanese into UTF-8, then you'll
spend *3* bytes per codepoint, ergo, "wasted memory".
@ Ramon:
I think that is a fallacy:
http://en.wikipedia.org/wiki/UTF-8#Compared_to_UTF-16
Real world usage is *dominated* by ASCII chars. Unless you have a
very specific use case, then, UTF8 will occupy *less* room than
UTF16, even if it contains a lot of foreign characters.
Furthermore, UTF-8 is pretty much the "standard". If you keep
UTF-16, you will probably end up regularly transcoding to UTF-8
to interface with char* functions.
Arguably, the "only" (IMO) usecase for UTF-16, is interfacing
with windows' UCS-2 API. But even then, there'll still be some
overhead, to make sure you don't have any dual-encoded in your
streams.
More information about the Digitalmars-d
mailing list