Why I chose D over Ada and Eiffel

monarch_dodra monarchdodra at gmail.com
Tue Aug 20 06:56:05 PDT 2013


On Tuesday, 20 August 2013 at 12:59:13 UTC, Andrej Mitrovic wrote:
> On 8/19/13, Ramon <spam at thanks.no> wrote:
>>    Plus UTF, too. Even UTF-8, 16 (a very practical compromise 
>> in
>> my minds eye because with 16 bits one can deal with *every*
>> language while still not wasting memory).
>
> UTF-8 can deal with every language as well. But perhaps you 
> meant
> something else here.
>
> Anyway welcome aboard!

I think he meant that every "modern spoken/written" language fits 
in the "Basic Multilingual Plane", for which each codepoint fits 
in a single UTF16 code unit (2 bytes). Multiple codeunit 
uncodings in UTF-16 are *very* rare.

On the other hand, if you encode japanese into UTF-8, then you'll 
spend *3* bytes per codepoint, ergo, "wasted memory".

@ Ramon:
I think that is a fallacy:
http://en.wikipedia.org/wiki/UTF-8#Compared_to_UTF-16
Real world usage is *dominated* by ASCII chars. Unless you have a 
very specific use case, then, UTF8 will occupy *less* room than 
UTF16, even if it contains a lot of foreign characters.

Furthermore, UTF-8 is pretty much the "standard". If you keep 
UTF-16, you will probably end up regularly transcoding to UTF-8 
to interface with char* functions.

Arguably, the "only" (IMO) usecase for UTF-16, is interfacing 
with windows' UCS-2 API. But even then, there'll still be some 
overhead, to make sure you don't have any dual-encoded in your 
streams.


More information about the Digitalmars-d mailing list